MLE Problem List
Reading time: ~40 min | Interview relevance: Critical | Roles: Machine Learning Engineer, Applied ML Engineer, ML Platform Engineer
You are applying for a Machine Learning Engineer position at a top tech company. The recruiter tells you there will be four rounds: coding, ML depth, system design, and behavioral. You have three weeks. What exactly should you practice?
This list of 45 problems is your answer. It is organized by interview round type, calibrated to the specific questions MLE candidates face, and sequenced to build skills in the right order. Every problem here has been reported in MLE interviews at major tech companies within the past two years.
MLE Interview Structure
Before diving into problems, understand what MLE interviews look like:
| Round | Duration | What They Test | Weight |
|---|---|---|---|
| Coding | 45-60 min | DSA + ML implementation | 25-30% |
| ML Depth | 45-60 min | Algorithm knowledge, training, evaluation | 25-30% |
| System Design | 45-60 min | End-to-end ML systems | 25-30% |
| Behavioral | 30-45 min | Collaboration, impact, conflict resolution | 10-20% |
:::tip The MLE Sweet Spot MLEs sit at the intersection of software engineering and machine learning. Interviewers expect you to code as well as a SWE AND reason about ML as well as a data scientist. Neither skill alone is sufficient. :::
Round 1: Coding Problems (15 Problems)
MLE coding rounds combine standard DSA with ML-flavored algorithmic problems. You need both.
Core DSA for MLEs
| # | Problem | Difficulty | Time | Key Pattern | Why MLEs Need It | Company Tags |
|---|---|---|---|---|---|---|
| 1 | Merge K Sorted Lists | Hard | 30 min | Min-heap merge | Merging sorted prediction streams; distributed training aggregation | Google, Meta, Amazon |
| 2 | LRU Cache | Medium | 25 min | Hash map + doubly-linked list | Feature caching, model result caching in serving systems | FAANG, All |
| 3 | Serialize and Deserialize Binary Tree | Hard | 30 min | BFS/DFS serialization | Model serialization; decision tree persistence | Google, Meta, Microsoft |
| 4 | Word Search II (Trie) | Hard | 35 min | Trie + backtracking | Text search, autocomplete, vocabulary building | Google, Amazon |
| 5 | Meeting Rooms II | Medium | 20 min | Sweep line / min-heap | Resource scheduling; GPU allocation; training job scheduling | Google, Meta, Uber |
ML-Flavored Coding
| # | Problem | Difficulty | Time | Key Pattern | Why MLEs Need It | Company Tags |
|---|---|---|---|---|---|---|
| 6 | Implement Batch Gradient Descent with Mini-Batches | Medium | 25 min | Data batching + gradient accumulation | Core training loop; tests understanding of batch vs. stochastic | FAANG, AI Labs |
| 7 | Implement K-Nearest Neighbors | Easy | 20 min | Distance computation + top-K selection | Baseline algorithm; tests NumPy fluency | Google, Meta, Startups |
| 8 | Implement TF-IDF from Scratch | Medium | 25 min | Term frequency + inverse document frequency | Text feature engineering; information retrieval basics | Google, Amazon, AI Labs |
| 9 | Implement AUC-ROC Computation | Medium | 20 min | Sorting + threshold sweep | Model evaluation; understanding ranking metrics | Meta, Uber, Airbnb |
| 10 | Implement Stratified K-Fold Cross-Validation | Medium | 20 min | Balanced splitting | Evaluation strategy for imbalanced datasets | Google, Meta, Big Tech |
Data Processing Coding
| # | Problem | Difficulty | Time | Key Pattern | Why MLEs Need It | Company Tags |
|---|---|---|---|---|---|---|
| 11 | Implement a Streaming Mean and Variance Calculator | Medium | 20 min | Welford's algorithm | Online feature normalization; streaming statistics | Google, Uber, Databricks |
| 12 | Implement Reservoir Sampling | Medium | 20 min | Probabilistic sampling | Sampling from data streams; training data selection | Google, Meta, Big Tech |
| 13 | Parse and Aggregate Large Log Files | Easy | 15 min | Hash map aggregation | Data pipeline debugging; metric computation | All |
| 14 | Implement Leaky Bucket Rate Limiter | Medium | 20 min | Queue-based rate limiting | API rate limiting for model serving endpoints | Uber, Airbnb, Stripe |
| 15 | Compute Intersection of Two Large Sorted Arrays | Easy | 15 min | Two-pointer merge | Feature set intersection; dataset alignment | Google, Meta |
:::warning Coding Round Red Flags for MLEs
- Cannot implement basic ML algorithms without sklearn
- Writes Python loops where NumPy vectorization is expected
- Cannot discuss time complexity of ML operations (e.g., KNN is O(n*d) per query)
- Ignores edge cases like empty datasets, single-class data, or NaN values :::
Round 2: ML Depth Problems (15 Problems)
These problems test your deep understanding of ML algorithms, training procedures, and evaluation strategies.
Algorithm Understanding
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 16 | Explain and Implement Gradient Boosted Trees | Hard | 35 min | Sequential ensemble with residual fitting | GBTs dominate tabular ML; must understand deeply | Google, Meta, Amazon |
| 17 | Implement Backpropagation for a 2-Layer Neural Network | Hard | 40 min | Chain rule, gradient computation | Tests true understanding of neural networks vs. framework usage | FAANG, AI Labs |
| 18 | Implement Word2Vec (Skip-gram with Negative Sampling) | Hard | 40 min | Embedding learning, contrastive loss | Foundation of all embedding-based systems | Google, Meta, Airbnb |
| 19 | Explain Random Forest vs. Gradient Boosting: When to Use Each | Medium | 20 min | Ensemble comparison | Practical model selection for tabular data | All |
| 20 | Implement Attention Mechanism from Scratch | Hard | 35 min | Scaled dot-product attention | Foundation of Transformer architecture | AI Labs, Google, Meta |
Training & Optimization
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 21 | Design a Training Pipeline for a Model with 100M Examples | Medium | 30 min | Data loading, batching, distributed training | Scale is the MLE differentiator | Google, Meta, Amazon |
| 22 | Debug a Model That Is Not Converging | Medium | 25 min | Systematic debugging: data, model, optimization | Real-world debugging is 50% of MLE work | FAANG, All |
| 23 | Explain and Implement Learning Rate Scheduling | Medium | 20 min | Step decay, cosine annealing, warm-up | Training stability and convergence speed | Google, Meta, AI Labs |
| 24 | Handle Missing Data: Compare Imputation Strategies | Medium | 20 min | Mean, median, KNN, MICE, model-based | Data quality directly impacts model quality | All |
| 25 | Implement Early Stopping with Patience | Easy | 15 min | Validation loss monitoring | Prevents overfitting; practical training technique | All |
Evaluation & Production
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 26 | Design an Offline Evaluation Framework for a Ranking Model | Medium | 30 min | NDCG, MAP, MRR computation | Offline evaluation gates production deployment | Google, Meta, LinkedIn |
| 27 | Detect and Mitigate Data Leakage | Medium | 25 min | Temporal leakage, feature leakage | #1 cause of models that work offline but fail online | All |
| 28 | Compare Online vs. Offline Metrics Discrepancy | Hard | 30 min | Distribution shift, delayed feedback | The classic MLE headache; tests production experience | Meta, Google, Uber |
| 29 | Design a Model Retraining Strategy | Medium | 25 min | Trigger-based vs. scheduled, data freshness | Models decay; retraining strategy is critical | FAANG, Big Tech |
| 30 | Implement Calibration Analysis for a Binary Classifier | Medium | 25 min | Reliability diagram, Platt scaling, isotonic regression | Calibrated probabilities are essential for decision-making | Meta, Google, Stripe |
:::tip The "Why" Behind ML Depth Questions Interviewers are not testing whether you can recite textbook definitions. They want to hear: "In my experience, I chose X over Y because..." Signal practical experience, not academic knowledge. :::
Round 3: System Design Problems (15 Problems)
MLE system design focuses on end-to-end ML systems with both offline (training) and online (serving) components.
Recommendation & Ranking Systems
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 31 | Design a Product Recommendation System for E-Commerce | Medium | 40 min | Collaborative filtering + content-based + hybrid | The canonical MLE system design problem | Amazon, Meta, Pinterest |
| 32 | Design a Search Ranking System | Hard | 45 min | Query understanding + retrieval + learning-to-rank | Search is the backbone of many products | Google, Amazon, Airbnb |
| 33 | Design a Video Recommendation System | Medium | 40 min | Multi-stage ranking + user session modeling | Video platforms have unique challenges (watch time, engagement) | YouTube/Google, Netflix, TikTok |
| 34 | Design an Ad Click Prediction System | Hard | 45 min | Feature engineering + real-time prediction + calibration | Ads generate the majority of revenue at most tech companies | Google, Meta, Amazon |
| 35 | Design a People You May Know System | Medium | 35 min | Graph-based features + ML ranking | Social graph + ML combination | LinkedIn, Meta, Twitter |
Classification & Detection Systems
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 36 | Design a Spam Detection System | Medium | 35 min | Text classification + feedback loops + adversarial robustness | Content safety is universal | Google, Meta, Microsoft |
| 37 | Design a Credit Risk Scoring System | Medium | 35 min | Feature engineering + explainability + fairness | Regulated ML with fairness constraints | Stripe, Square, Goldman |
| 38 | Design an Anomaly Detection System for Cloud Infrastructure | Medium | 35 min | Time-series models + alerting + root cause analysis | Infrastructure monitoring with ML | Google, Amazon, Datadog |
| 39 | Design a Document Classification Pipeline | Medium | 30 min | NLP preprocessing + embeddings + fine-tuning | Document understanding is a core ML task | Google, Amazon, AI Labs |
| 40 | Design an Image Classification System at Scale | Medium | 35 min | CNN training + transfer learning + serving optimization | Computer vision MLE fundamentals | Google, Apple, Tesla |
ML Infrastructure Systems
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 41 | Design a Model Experimentation Platform | Hard | 45 min | Experiment tracking, reproducibility, collaboration | The backbone of ML development | FAANG, Unicorns |
| 42 | Design a Feature Pipeline with Online and Offline Serving | Hard | 45 min | Feature computation, consistency, freshness | Feature engineering at scale | Uber, Airbnb, Databricks |
| 43 | Design a Model A/B Testing Framework | Medium | 35 min | Traffic splitting, metric computation, statistical tests | Every model change needs rigorous testing | FAANG, Big Tech |
| 44 | Design a Distributed Training System | Hard | 45 min | Data parallelism, model parallelism, gradient aggregation | Training at scale is a core MLE responsibility | Google, Meta, AI Labs |
| 45 | Design a Multi-Model Serving Architecture | Medium | 35 min | Model registry, canary deployment, fallback | Serving multiple models in production | FAANG, Unicorns |
:::note System Design Deep Dive Areas For each system design problem, be prepared to deep dive into:
- Data pipeline: How data flows from raw to features
- Model architecture: Why this model for this problem
- Training infrastructure: How training scales
- Serving architecture: Latency, throughput, reliability
- Monitoring: What can go wrong and how to detect it :::
4-Week MLE Study Plan
| Week | Focus | Problems | Daily Load |
|---|---|---|---|
| Week 1 | Coding fundamentals | #1-15 (Coding round) | 2-3 problems/day |
| Week 2 | ML depth | #16-30 (ML depth round) | 2 problems/day + review coding |
| Week 3 | System design | #31-45 (System design round) | 1-2 designs/day + review ML |
| Week 4 | Integration + mock | Mix of all rounds | 1 problem + 1 mock interview/day |
Week 1: Coding Deep Dive
Day 1: #1, #2 (heap merge, LRU cache)
Day 2: #3, #4 (serialization, trie)
Day 3: #5, #6 (scheduling, mini-batch GD)
Day 4: #7, #8 (KNN, TF-IDF)
Day 5: #9, #10 (AUC-ROC, stratified CV)
Day 6: #11, #12, #13 (streaming stats, reservoir sampling, logs)
Day 7: #14, #15 + review weak problems
Week 2: ML Depth Focus
Day 1: #16, #17 (GBT, backprop)
Day 2: #18, #19 (Word2Vec, RF vs GBT)
Day 3: #20, #21 (attention, large-scale training)
Day 4: #22, #23 (debugging, LR scheduling)
Day 5: #24, #25 (missing data, early stopping)
Day 6: #26, #27 (offline eval, data leakage)
Day 7: #28, #29, #30 (online/offline gap, retraining, calibration)
Week 3: System Design Sprint
Day 1: #31 (product recommendations)
Day 2: #32 (search ranking)
Day 3: #33 (video recommendations)
Day 4: #34, #35 (ad prediction, people you may know)
Day 5: #36, #37 (spam detection, credit scoring)
Day 6: #38, #39, #40 (anomaly detection, doc classification, image classification)
Day 7: #41, #42 (experimentation platform, feature pipeline)
Week 4: Polish and Mock
Day 1: #43, #44, #45 (A/B testing, distributed training, multi-model serving)
Day 2-3: Re-solve all Yellow/Red problems
Day 4-5: Full mock interviews (1 coding + 1 ML + 1 design)
Day 6-7: Final review of weak areas
Problem Deep Dives
Problem 17: Implement Backpropagation for a 2-Layer Neural Network
Why this problem matters: If you cannot implement backprop from scratch, interviewers will question whether you truly understand the models you build. This is the single most important MLE implementation problem.
Setup:
Network: Input(d) -> Hidden(h) -> Output(1)
Activation: ReLU for hidden, Sigmoid for output
Loss: Binary Cross-Entropy
Forward Pass:
z1 = X @ W1 + b1 # (n, h)
a1 = max(0, z1) # ReLU
z2 = a1 @ W2 + b2 # (n, 1)
a2 = sigmoid(z2) # Output probability
loss = -mean(y*log(a2) + (1-y)*log(1-a2))
Backward Pass:
dz2 = a2 - y # (n, 1)
dW2 = (1/n) * a1.T @ dz2 # (h, 1)
db2 = (1/n) * sum(dz2) # (1,)
da1 = dz2 @ W2.T # (n, h)
dz1 = da1 * (z1 > 0) # ReLU derivative
dW1 = (1/n) * X.T @ dz1 # (d, h)
db1 = (1/n) * sum(dz1) # (h,)
Key Points Interviewers Check:
- Correct application of chain rule
- ReLU derivative is 0 where input <= 0
- Matrix dimension alignment
- Numerical stability in sigmoid and log
Problem 32: Design a Search Ranking System
Why this problem matters: Search ranking combines information retrieval, ML, and systems engineering -- the full MLE skill set.
Architecture:
Query -> Query Understanding -> Retrieval -> Ranking -> Reranking -> Results
1. Query Understanding
- Spell correction, query expansion, intent classification
- Embedding-based query representation
2. Retrieval (get ~1000 candidates)
- Inverted index (BM25)
- Embedding-based retrieval (ANN)
- Combine with reciprocal rank fusion
3. Ranking (score ~1000 -> top 100)
- Features: query-document relevance, freshness, authority, user history
- Model: LambdaMART or neural ranker
- Training data: clicks, dwell time (with position bias correction)
4. Reranking (top 100 -> final order)
- Diversity injection
- Personalization
- Business rules (ads, promoted content)
5. Evaluation
- Offline: NDCG@10, MAP, MRR
- Online: CTR, query success rate, session length
MLE-Specific Patterns to Master
| Pattern | Where It Appears | Problems |
|---|---|---|
| Batched computation | Training, inference, feature computation | #6, #21, #44 |
| Online/offline consistency | Feature stores, model serving | #28, #42 |
| Multi-stage pipeline | Recommendation, search, ads | #31, #32, #33, #34 |
| Calibration | Any system using probabilities for decisions | #30, #34 |
| Feedback loops | Spam, recommendation, content moderation | #36 |
| Distribution shift | Production model degradation | #28, #29 |
| Embedding-based retrieval | Search, recommendations, similarity | #18, #33 |
| Experiment design | Any model deployment | #43 |
Difficulty Distribution
| Difficulty | Problems | Count |
|---|---|---|
| Easy | #7, #13, #15, #25 | 4 |
| Medium | #2, #5, #6, #8, #9, #10, #11, #12, #14, #19, #21, #22, #23, #24, #26, #27, #29, #30, #31, #33, #35, #36, #37, #38, #39, #40, #43, #45 | 28 |
| Hard | #1, #3, #4, #16, #17, #18, #20, #28, #32, #34, #41, #42, #44 | 13 |
Next Steps
After completing the MLE problem list, consider:
- AI Engineer Problems if your role includes LLM/GenAI work
- Hard Tier if targeting Staff+ MLE roles
- Google-Style Problems or Meta-Style Problems to calibrate for specific companies
- Section 15: Role-Specific Prep for the full MLE preparation path
