Skip to main content

ML System Design Framework - Your 45-Minute Playbook

Reading time: ~20 min | Interview relevance: Critical | Roles: MLE, AI Eng, MLOps

The Framework: RPFMSE

Every ML system design answer should follow these 6 steps. This structure ensures you cover everything interviewers evaluate - and prevents the most common failure mode (spending 30 minutes on the model and 0 minutes on serving and evaluation).

RPFMSE Framework - 6 steps: Requirements, Problem Formulation, Features, Model, Serving, Evaluation

Step 1: Requirements (5 min)

Goal: Clarify the problem before designing anything.

Questions to Ask

Functional requirements:

  • What is the core user experience? What does the user see/do?
  • What inputs does the system receive? What outputs does it produce?
  • What scale are we designing for? (users, requests/sec, data volume)

Non-functional requirements:

  • What's the latency budget? (real-time: <100ms, near-real-time: <1s, batch: hours)
  • What's the accuracy/quality bar? (99.9% precision, or 80% recall is fine?)
  • What's the cost budget? (GPU serving at scale is expensive)
  • Is there existing data? Labels? Infrastructure?

BAD: Skip requirements and start drawing boxes.

GOOD: "Before I design, let me make sure I understand the constraints. We're building a recommendation system for an e-commerce platform with 10M users, 1M products, and we need to serve recommendations in under 200ms. Are there specific business objectives - increasing revenue, engagement, or both? And do we have historical user-item interaction data to start with?"

Interviewer's Perspective

The requirements phase is where I assess seniority. Junior candidates skip it entirely. Mid-level candidates ask basic questions. Senior candidates ask the questions that change the design - like "Is this real-time or batch?" or "Do we optimize for revenue or engagement?" These questions show you've built real systems and know what matters.

Step 2: Problem Formulation (5 min)

Goal: Translate the business problem into an ML problem.

The Translation

Business GoalML ObjectiveMetric
"Increase purchases"Predict P(purchase | user, item)Conversion rate, revenue per session
"Reduce fraud"Binary classification: fraud vs. legitPrecision @ low FPR, dollar amount saved
"Show relevant search results"Learning-to-rank: order results by relevanceNDCG, MRR
"Filter harmful content"Multi-label classification: toxicity categoriesRecall (catch harmful) + Precision (don't over-block)
"Answer customer questions"RAG + generation: retrieve context, generate answerAnswer accuracy, user satisfaction, resolution rate

Key Decisions at This Stage

  1. What type of ML problem? Classification, regression, ranking, generation, retrieval?
  2. What's the prediction target? P(click), P(fraud), relevance score, generated text?
  3. What's the north star metric? One primary metric + 2-3 guardrail metrics.
  4. Offline vs. online? Can we do batch predictions or need real-time?

Step 3: Features & Data (8 min)

Goal: Identify data sources, engineer features, and handle labels.

Feature Categories

For most ML systems, features fall into these categories:

CategoryExamplesFreshness
User featuresDemographics, history, preferences, engagement patternsUpdated hourly-daily
Item featuresCategory, price, description embeddings, popularityUpdated on change
Context featuresTime of day, device, location, session behaviorReal-time
Cross featuresUser-item affinity, user-category preference, co-occurrenceComputed batch or real-time

Data Considerations

  • Label availability: Do we have ground truth? How is it collected? What's the label delay?
  • Class imbalance: What's the positive rate? (fraud: 0.1%, clicks: 3%, purchases: 1%)
  • Data freshness: How often does the data distribution change?
  • Data quality: Missing values, duplicates, noise, adversarial data
  • Training data construction: How do you avoid leakage? Point-in-time correctness?
Common Trap

Many candidates list features without thinking about serving. "Average purchase amount over the last 30 days" is easy in batch SQL but requires a streaming aggregation pipeline for real-time. Always ask: "Can I compute this feature at serving time within my latency budget?"

Step 4: Model (8 min)

Goal: Start simple, iterate toward complexity with justification.

The Progression

Model Progression - Baseline (Rules/LR) → Gradient Boosting → Deep Learning → Hybrid/Ensemble

Always start with a baseline. This shows engineering judgment and gives the interviewer confidence you won't over-engineer.

Problem TypeBaselineStrong ModelWhy Start Simple
ClassificationLogistic RegressionXGBoost → Neural NetworkInterpretable, fast, sets benchmark
RankingPointwise LRPairwise (LambdaMART) → Listwise (Deep Ranking)Understand feature importance first
RetrievalTF-IDF + BM25Two-tower embedding modelFast, no training needed
GenerationTemplate-basedLLM with RAGReliable, deterministic

What to Cover

  • Architecture: What model and why (for this specific problem)?
  • Training: How do you train? Data splits, hyperparameter tuning, training infrastructure.
  • Offline evaluation: What metrics, on what holdout set?
  • Trade-offs: Why this model over alternatives? What did you sacrifice?

Step 5: Serving (8 min)

Goal: Get the model into production reliably.

Key Decisions

DecisionOptionsTrade-offs
Real-time vs. batchReal-time: per-request predictions. Batch: pre-compute, cache.Latency vs. freshness vs. cost
Model formatPyTorch, ONNX, TensorRTFlexibility vs. inference speed
InfrastructureGPU vs. CPUCost vs. latency
ScalingHorizontal (more replicas) vs. vertical (bigger machines)Cost vs. simplicity
CachingCache predictions for common inputsReduces cost, but stale results
FallbackWhat happens when model is down?Rules-based fallback, cached results, or graceful degradation

Multi-Stage Serving (Common for Ranking)

Multi-Stage Serving - Candidate Generation → Scoring → Re-ranking → User (top 20)

Step 6: Evaluation & Iteration (8 min)

Goal: Measure, monitor, and improve.

Offline Evaluation

  • Holdout test set with proper temporal split (no future data leakage)
  • Metrics matched to business goals (see Step 2)
  • Error analysis: where does the model fail? What patterns emerge?

Online Evaluation

  • A/B testing: Treatment (new model) vs. Control (current model), measure business KPIs
  • Interleaving: For ranking systems, interleave results from both models in the same list
  • Canary deployment: Roll out to 5-10% of traffic, monitor for regressions

Monitoring

  • Input drift: Feature distributions changing from training data
  • Output drift: Prediction distribution shifting
  • Performance drift: Online metrics degrading
  • Alerting: Automated alerts with thresholds + human review

Iteration Plan

  • What would V2 look like? What's the next biggest improvement?
  • What data would you need? What experiments would you run?
  • What would you change about the architecture?
Interviewer's Perspective

Ending with an iteration plan is the strongest possible close. It tells me: "This person doesn't think they're done - they're already thinking about how to make it better." That's exactly the mindset I want on my team.

Time Management Cheat Sheet

PhaseTimeWhat to Say When Transitioning
Requirements0:00-5:00"Now that I understand the constraints, let me formulate this as an ML problem."
Problem Formulation5:00-10:00"With the objective defined, let me think about the features and data pipeline."
Features & Data10:00-18:00"Given these features, here's my model approach."
Model18:00-26:00"Now let me discuss how we'd serve this in production."
Serving26:00-34:00"Finally, let me cover evaluation and monitoring."
Evaluation34:00-42:00"Here's what I'd focus on for V2."
Q&A42:00-45:00"Happy to go deeper on any component."

Spaced Repetition Checkpoints

  • Day 0: Memorize the 6 steps (RPFMSE). Draw the framework from memory.
  • Day 3: Apply the framework to a Recommendation System. Time yourself.
  • Day 7: Apply to a completely different problem (Fraud Detection). Verify you hit all 6 steps.
  • Day 14: Do a mock interview. Have your partner score you on each step.
  • Day 21: The framework should be automatic. Focus on depth within each step.

What's Next

© 2026 EngineersOfAI. All rights reserved.