Skip to main content

Module 06: Case Studies

Theory without production context is incomplete. This module takes the architectural patterns from previous modules and applies them to six real-world system design problems. Each case study is the kind of question you will face in a system design interview at a top tech company - and the kind of system you will need to design and defend in a real engineering role.

What You Will Learn

Case Study Map

#SystemScaleKey Challenges
01RecommendationBillions of items, millions of usersTwo-stage architecture, cold start, freshness
02Search RankingMillions of queries/daySemantic retrieval, LTR, A/B testing
03Fraud Detectionunder 100ms, 0.001% fraud rateClass imbalance, delayed labels, concept drift
04Content Moderation500 hours of video/minuteMulti-modal, human + AI, adversarial
05Ad Click Prediction8.5B impressions/dayOnline learning, calibration, exploration
06LLM ProductsTrillions of tokens/monthCost, latency, hallucination, observability

Key Patterns Across All Case Studies

  • Two-stage architectures: fast candidate retrieval followed by slow, expensive ranking - appears in recommendation, search, fraud, and ads
  • Real-time plus batch hybrid: precomputed embeddings updated offline, real-time features computed online - appears in every case study
  • Human-in-the-loop: ML scales the decision volume; humans handle edge cases, appeals, and label generation - prominent in moderation and fraud
  • Feedback loops: user behavior drives model training drives user behavior - must be managed in recommendation, ads, and search
  • Extreme class imbalance: 0.001% click rates, 0.01% fraud rates, 0.001% policy violations - sampling and loss weighting strategies critical

How to Use This Module

Each case study is structured as a complete system design walkthrough, from requirements through full architecture. Read each case study as if you are in an interview: start by identifying the functional and non-functional requirements, then work through the architecture layer by layer. The Interview Q&A at the end of each lesson is calibrated to the questions actually asked at Meta, Google, Stripe, Airbnb, and similar companies.

For each system, you should be able to:

  1. State the problem in one sentence and the key constraints (latency, scale, accuracy)
  2. Draw the two-level or three-level architecture from memory
  3. Explain the key modeling choice and why alternatives would fail
  4. Describe the serving path in terms of latency budget
  5. Explain how the system handles the characteristic failure mode (cold start, concept drift, adversarial inputs)
© 2026 EngineersOfAI. All rights reserved.