ML System Design Round - The Differentiator
Reading time: ~22 min | Interview relevance: Critical | Roles: MLE, AI Eng, MLOps (Senior+)
The Real Interview Moment
"Design a real-time fraud detection system for a payment platform processing 10,000 transactions per second."
You have 45 minutes and a whiteboard. The interviewer isn't looking for the "right" architecture - there isn't one. They're evaluating how you think: Do you start with requirements? Do you consider trade-offs? Do you think about what happens when the model is wrong? Do you plan for iteration?
The system design round is the most differentiated round in AI interviews. It's where strong candidates pull ahead and where the gap between "knows ML" and "can build ML systems" becomes visible.
What You Will Master
- The ML System Design framework (RPFMSE: Requirements, Problem, Features, Model, Serving, Evaluation)
- How to manage 45 minutes effectively
- What interviewers score and what separates "Hire" from "Strong Hire"
- The 10 most common ML system design problems
- How AI system design differs from traditional ML system design
Part 1 - The Framework
RPFMSE: A 6-Step Framework
Step-by-Step
Step 1: Requirements (5 min)
Ask clarifying questions before designing anything.
Functional: What should the system do? What inputs/outputs? What user experience?
Non-functional: What scale? What latency? What's the cost budget? What's the accuracy requirement?
BAD: Start drawing architecture immediately.
GOOD: "Before I design, let me understand the constraints. What's the transaction volume? What's the acceptable latency for a fraud decision? What's our tolerance for false positives vs. false negatives? Is there labeled fraud data available?"
Step 2: Problem Formulation (5 min)
Translate the business problem into an ML problem.
- What's the ML objective? (classification, ranking, regression, generation)
- What's the prediction target?
- What metric maps to business success?
- Is this a real-time or batch problem?
Step 3: Features & Data (8 min)
- What data sources are available?
- What features can you engineer?
- How do you handle training data (labels, sampling, splits)?
- Feature freshness: real-time vs. batch features
Step 4: Model Architecture (8 min)
- Start with a simple baseline (logistic regression, rules)
- Propose a more complex model with justification
- Discuss training approach (batch, online, transfer learning)
- Address scale: distributed training if needed
Step 5: Serving & Infrastructure (8 min)
- Real-time vs. batch inference
- Latency optimization (caching, model compression, batching)
- A/B testing framework for model deployment
- Fallback behavior when the model fails
Step 6: Evaluation & Iteration (8 min)
- Offline metrics (precision, recall, AUC)
- Online metrics (business KPIs, user engagement)
- A/B testing methodology
- Monitoring: data drift, model performance, alerting
- How you'd iterate: what would V2 look like?
The candidates who get "Strong Hire" in system design are the ones who naturally talk about failure modes, monitoring, and iteration without being prompted. If I have to ask "What happens when the model is wrong?" - that's a yellow flag. The best candidates preemptively address: "Here's how I'd detect model degradation, here's my rollback strategy, and here's what V2 would focus on."
Part 2 - The 10 Most Common Problems
| Problem | Key Challenges | Primary Role |
|---|---|---|
| Recommendation System | Cold start, real-time personalization, exploration vs. exploitation | MLE |
| Fraud Detection | Class imbalance, real-time latency, adversarial evolution | MLE |
| Search Ranking | Multi-stage ranking, relevance vs. freshness, query understanding | MLE |
| Ad Click Prediction | Scale (billions of events), calibration, feature engineering at scale | MLE |
| Content Moderation | Multi-modal (text + image), edge cases, false positive sensitivity | MLE / AI Eng |
| Customer Support Chatbot | RAG, tool use, guardrails, escalation logic | AI Engineer |
| Enterprise Search | Multi-source retrieval, access control, relevance tuning | AI Engineer |
| AI Code Review Assistant | Context understanding, false positive rate, developer trust | AI Engineer |
| ML Platform / Feature Store | Training-serving consistency, freshness, scale | MLOps |
| Model Monitoring System | Drift detection, alerting, automated retraining | MLOps |
Part 3 - ML vs. AI System Design
Traditional ML System Design (MLE)
Focus on: training pipeline, feature engineering, model selection, offline evaluation, serving, monitoring.
AI/LLM System Design (AI Engineer)
Focus on: retrieval (RAG), LLM orchestration, prompt design, guardrails, tool use, evaluation, cost management.
Part 4 - Scoring Rubric
| Criterion | No Hire | Lean Hire | Strong Hire |
|---|---|---|---|
| Requirements | Skips requirements | Asks basic questions | Uncovers non-obvious constraints |
| Problem formulation | Wrong objective | Correct but generic | Precise, considers business context |
| Features | Only raw features | Good feature ideas | Creative features + freshness/serving considerations |
| Model | Jumps to complex model | Baseline + one iteration | Baseline → iterate, with clear justification |
| Serving | Ignores infra | Basic serving discussion | Latency optimization, fallbacks, scaling |
| Evaluation | No evaluation plan | Offline metrics only | Offline + online + monitoring + iteration |
| Communication | Unstructured, hard to follow | Organized, clear | Structured, concise, proactively addresses concerns |
Practice Problems
Problem 1: Design a News Feed Ranking System
Hint 1 - Direction
Think about this as a multi-objective ranking problem: relevance, freshness, diversity, engagement prediction. Multi-stage ranking (candidate generation → scoring → re-ranking) is standard.
Full Answer (Abbreviated)
Requirements: 500M users, 10K candidate posts per user, rank top 50 for display. Latency: <200ms. Metrics: engagement (clicks, time spent) + diversity + freshness.
Problem: Multi-stage ranking pipeline. Stage 1: candidate generation (retrieve 10K from 1M+ posts). Stage 2: scoring model (rank 10K → 500). Stage 3: re-ranking (business rules, diversity injection).
Features: User features (interests, past engagement, demographics), post features (topic, author, freshness, engagement rate), cross features (user-post affinity, social connection to author).
Model: Candidate generation: dual-tower model (user embedding + post embedding, approximate nearest neighbors). Scoring: gradient-boosted tree or deep ranking model. Re-ranking: rule-based diversity/freshness injection.
Serving: Pre-compute user embeddings, update post embeddings hourly. Real-time scoring on request. Cache frequent user feeds with TTL.
Evaluation: Offline: NDCG, diversity metrics. Online: session time, daily return rate, content diversity consumed. A/B test every major model change.
Interview Cheat Sheet
| Phase | What to Say | Time |
|---|---|---|
| Start | "Let me start by understanding the requirements and constraints" | 0-5 min |
| Problem | "I'd frame this as a [classification/ranking/...] problem with [metric] as the north star" | 5-10 min |
| Features | "For features, I'd consider these categories: [user, item, context, cross]" | 10-18 min |
| Model | "I'd start with [simple baseline] and iterate toward [complex model] if needed" | 18-26 min |
| Serving | "For serving, the key constraints are [latency/scale/cost]" | 26-34 min |
| Evaluation | "To evaluate, I'd combine offline metrics with online A/B testing and continuous monitoring" | 34-42 min |
| Q&A | "What aspects would you like me to go deeper on?" | 42-45 min |
Spaced Repetition Checkpoints
- Day 0: Memorize the RPFMSE framework. Practice drawing it from memory.
- Day 3: Design a recommendation system end-to-end in 45 minutes. Time yourself.
- Day 7: Design a fraud detection system. Focus on real-time serving and class imbalance.
- Day 14: Do a mock system design round with a friend. Get feedback on structure and depth.
- Day 21: Design an AI/LLM system (chatbot or search). Practice the AI-specific framework.
What's Next
- For full system design problems → ML System Design
- Paper Discussion Round - For research-focused roles
- Behavioral Round - The soft skills round
