ML System Design - The Most Differentiated Round
Reading time: ~15 min | Interview relevance: Critical | Roles: MLE, AI Eng, MLOps (Senior+)
The Real Interview Moment
You're 5 minutes into a system design round. The interviewer said: "Design a recommendation system for an e-commerce marketplace." You started drawing boxes - "data pipeline," "model," "serving layer." But 10 minutes in, the interviewer interrupts: "You've been describing components. I want to hear about trade-offs. Why this model over that one? What happens when a new user has no history? How do you measure success?"
This is the system design round. It's not about drawing the "correct" architecture diagram - it's about demonstrating that you can reason through ambiguity, make justified trade-offs, and think about ML systems as living, evolving products. This section gives you a framework, a rubric, and 13 complete design problems to practice with.
What You Will Master
- A repeatable framework for any ML system design question
- The exact rubric interviewers use to score your answer
- 13 complete design problems covering the full range of ML systems
- How AI/LLM system design differs from traditional ML system design
- Time management strategies for the 45-minute round
Section Roadmap
| Page | What It Covers | Read If |
|---|---|---|
| Design Framework | The 6-step RPFMSE framework in detail | Everyone - this is your foundation |
| Evaluation Rubric | How interviewers score each component | Everyone - know what gets you "Strong Hire" |
| Recommendation System | Collaborative filtering, content-based, hybrid, cold start | MLE, AI Eng |
| Search Ranking | Query understanding, retrieval, ranking, personalization | MLE, AI Eng |
| Fraud Detection | Real-time scoring, class imbalance, adversarial evolution | MLE |
| News Feed Ranking | Multi-objective optimization, real-time features, diversity | MLE |
| Ad Click Prediction | Feature stores, real-time bidding, calibration at scale | MLE |
| Content Moderation | Multi-modal classification, human-in-the-loop, appeals | MLE, AI Eng |
| Autonomous Driving | Perception, prediction, planning, safety | MLE (specialized) |
| AI Chatbot | RAG, guardrails, conversation management, evaluation | AI Engineer |
| Visual Search | Embedding models, ANN indexing, cross-modal search | MLE |
| Anomaly Detection | Unsupervised methods, streaming, alerting | MLE, MLOps |
| Machine Translation | Encoder-decoder, quality estimation, low-resource | MLE |
| Speech Recognition | Acoustic models, language models, streaming ASR | MLE (specialized) |
| A/B Testing Platform | Experiment platform, statistical rigor, automation | MLOps, DS |
Priority Order for Practice
Quick Reference: The Framework in 60 Seconds
"For any ML system design question, I follow a 6-step framework: (1) Requirements - clarify functional and non-functional constraints. (2) Problem formulation - translate the business goal into an ML objective with metrics. (3) Features and data - identify data sources, engineer features, handle labels. (4) Model - start with a baseline, iterate toward complexity with justification. (5) Serving - real-time vs. batch, latency optimization, failure handling. (6) Evaluation - offline metrics, online A/B testing, monitoring for drift, and a plan for iteration. The key is to cover all six steps in 45 minutes, spending roughly 5-8 minutes on each, rather than going deep on modeling and ignoring serving and evaluation."
Spaced Repetition Checkpoints
- Day 0: Read the Framework and Rubric pages. Memorize the 6 steps.
- Day 3: Design a Recommendation System in 45 minutes. Compare against the model answer.
- Day 7: Design Fraud Detection. Focus on real-time serving and class imbalance.
- Day 14: Do a mock system design with a friend. Get feedback on structure and trade-off discussion.
- Day 21: Design 2 more problems from the list. By now, the framework should feel natural.
What's Next
Start with The Design Framework - it's the foundation for every problem in this section.
