Module 06: Case Studies
Theory without production context is incomplete. This module takes the architectural patterns from previous modules and applies them to six real-world system design problems. Each case study is the kind of question you will face in a system design interview at a top tech company - and the kind of system you will need to design and defend in a real engineering role.
What You Will Learn
Case Study Map
| # | System | Scale | Key Challenges |
|---|---|---|---|
| 01 | Recommendation | Billions of items, millions of users | Two-stage architecture, cold start, freshness |
| 02 | Search Ranking | Millions of queries/day | Semantic retrieval, LTR, A/B testing |
| 03 | Fraud Detection | under 100ms, 0.001% fraud rate | Class imbalance, delayed labels, concept drift |
| 04 | Content Moderation | 500 hours of video/minute | Multi-modal, human + AI, adversarial |
| 05 | Ad Click Prediction | 8.5B impressions/day | Online learning, calibration, exploration |
| 06 | LLM Products | Trillions of tokens/month | Cost, latency, hallucination, observability |
Key Patterns Across All Case Studies
- Two-stage architectures: fast candidate retrieval followed by slow, expensive ranking - appears in recommendation, search, fraud, and ads
- Real-time plus batch hybrid: precomputed embeddings updated offline, real-time features computed online - appears in every case study
- Human-in-the-loop: ML scales the decision volume; humans handle edge cases, appeals, and label generation - prominent in moderation and fraud
- Feedback loops: user behavior drives model training drives user behavior - must be managed in recommendation, ads, and search
- Extreme class imbalance: 0.001% click rates, 0.01% fraud rates, 0.001% policy violations - sampling and loss weighting strategies critical
How to Use This Module
Each case study is structured as a complete system design walkthrough, from requirements through full architecture. Read each case study as if you are in an interview: start by identifying the functional and non-functional requirements, then work through the architecture layer by layer. The Interview Q&A at the end of each lesson is calibrated to the questions actually asked at Meta, Google, Stripe, Airbnb, and similar companies.
For each system, you should be able to:
- State the problem in one sentence and the key constraints (latency, scale, accuracy)
- Draw the two-level or three-level architecture from memory
- Explain the key modeling choice and why alternatives would fail
- Describe the serving path in terms of latency budget
- Explain how the system handles the characteristic failure mode (cold start, concept drift, adversarial inputs)
