MLE Problem List

Reading time: ~40 min | Interview relevance: Critical | Roles: Machine Learning Engineer, Applied ML Engineer, ML Platform Engineer

You are applying for a Machine Learning Engineer position at a top tech company. The recruiter tells you there will be four rounds: coding, ML depth, system design, and behavioral. You have three weeks. What exactly should you practice?

This list of 45 problems is your answer. It is organized by interview round type, calibrated to the specific questions MLE candidates face, and sequenced to build skills in the right order. Every problem here has been reported in MLE interviews at major tech companies within the past two years.

MLE Interview Structure

Before diving into problems, understand what MLE interviews look like:

Round	Duration	What They Test	Weight
Coding	45-60 min	DSA + ML implementation	25-30%
ML Depth	45-60 min	Algorithm knowledge, training, evaluation	25-30%
System Design	45-60 min	End-to-end ML systems	25-30%
Behavioral	30-45 min	Collaboration, impact, conflict resolution	10-20%

:::tip The MLE Sweet Spot MLEs sit at the intersection of software engineering and machine learning. Interviewers expect you to code as well as a SWE AND reason about ML as well as a data scientist. Neither skill alone is sufficient. :::

Round 1: Coding Problems (15 Problems)

MLE coding rounds combine standard DSA with ML-flavored algorithmic problems. You need both.

Core DSA for MLEs

#	Problem	Difficulty	Time	Key Pattern	Why MLEs Need It	Company Tags
1	Merge K Sorted Lists	Hard	30 min	Min-heap merge	Merging sorted prediction streams; distributed training aggregation	Google, Meta, Amazon
2	LRU Cache	Medium	25 min	Hash map + doubly-linked list	Feature caching, model result caching in serving systems	FAANG, All
3	Serialize and Deserialize Binary Tree	Hard	30 min	BFS/DFS serialization	Model serialization; decision tree persistence	Google, Meta, Microsoft
4	Word Search II (Trie)	Hard	35 min	Trie + backtracking	Text search, autocomplete, vocabulary building	Google, Amazon
5	Meeting Rooms II	Medium	20 min	Sweep line / min-heap	Resource scheduling; GPU allocation; training job scheduling	Google, Meta, Uber

ML-Flavored Coding

#	Problem	Difficulty	Time	Key Pattern	Why MLEs Need It	Company Tags
6	Implement Batch Gradient Descent with Mini-Batches	Medium	25 min	Data batching + gradient accumulation	Core training loop; tests understanding of batch vs. stochastic	FAANG, AI Labs
7	Implement K-Nearest Neighbors	Easy	20 min	Distance computation + top-K selection	Baseline algorithm; tests NumPy fluency	Google, Meta, Startups
8	Implement TF-IDF from Scratch	Medium	25 min	Term frequency + inverse document frequency	Text feature engineering; information retrieval basics	Google, Amazon, AI Labs
9	Implement AUC-ROC Computation	Medium	20 min	Sorting + threshold sweep	Model evaluation; understanding ranking metrics	Meta, Uber, Airbnb
10	Implement Stratified K-Fold Cross-Validation	Medium	20 min	Balanced splitting	Evaluation strategy for imbalanced datasets	Google, Meta, Big Tech

Data Processing Coding

#	Problem	Difficulty	Time	Key Pattern	Why MLEs Need It	Company Tags
11	Implement a Streaming Mean and Variance Calculator	Medium	20 min	Welford's algorithm	Online feature normalization; streaming statistics	Google, Uber, Databricks
12	Implement Reservoir Sampling	Medium	20 min	Probabilistic sampling	Sampling from data streams; training data selection	Google, Meta, Big Tech
13	Parse and Aggregate Large Log Files	Easy	15 min	Hash map aggregation	Data pipeline debugging; metric computation	All
14	Implement Leaky Bucket Rate Limiter	Medium	20 min	Queue-based rate limiting	API rate limiting for model serving endpoints	Uber, Airbnb, Stripe
15	Compute Intersection of Two Large Sorted Arrays	Easy	15 min	Two-pointer merge	Feature set intersection; dataset alignment	Google, Meta

:::warning Coding Round Red Flags for MLEs

Cannot implement basic ML algorithms without sklearn
Writes Python loops where NumPy vectorization is expected
Cannot discuss time complexity of ML operations (e.g., KNN is O(n*d) per query)
Ignores edge cases like empty datasets, single-class data, or NaN values :::

Round 2: ML Depth Problems (15 Problems)

These problems test your deep understanding of ML algorithms, training procedures, and evaluation strategies.

Algorithm Understanding

#	Problem	Difficulty	Time	Key Pattern	Why It Matters	Company Tags
16	Explain and Implement Gradient Boosted Trees	Hard	35 min	Sequential ensemble with residual fitting	GBTs dominate tabular ML; must understand deeply	Google, Meta, Amazon
17	Implement Backpropagation for a 2-Layer Neural Network	Hard	40 min	Chain rule, gradient computation	Tests true understanding of neural networks vs. framework usage	FAANG, AI Labs
18	Implement Word2Vec (Skip-gram with Negative Sampling)	Hard	40 min	Embedding learning, contrastive loss	Foundation of all embedding-based systems	Google, Meta, Airbnb
19	Explain Random Forest vs. Gradient Boosting: When to Use Each	Medium	20 min	Ensemble comparison	Practical model selection for tabular data	All
20	Implement Attention Mechanism from Scratch	Hard	35 min	Scaled dot-product attention	Foundation of Transformer architecture	AI Labs, Google, Meta

Training & Optimization

#	Problem	Difficulty	Time	Key Pattern	Why It Matters	Company Tags
21	Design a Training Pipeline for a Model with 100M Examples	Medium	30 min	Data loading, batching, distributed training	Scale is the MLE differentiator	Google, Meta, Amazon
22	Debug a Model That Is Not Converging	Medium	25 min	Systematic debugging: data, model, optimization	Real-world debugging is 50% of MLE work	FAANG, All
23	Explain and Implement Learning Rate Scheduling	Medium	20 min	Step decay, cosine annealing, warm-up	Training stability and convergence speed	Google, Meta, AI Labs
24	Handle Missing Data: Compare Imputation Strategies	Medium	20 min	Mean, median, KNN, MICE, model-based	Data quality directly impacts model quality	All
25	Implement Early Stopping with Patience	Easy	15 min	Validation loss monitoring	Prevents overfitting; practical training technique	All

Evaluation & Production

#	Problem	Difficulty	Time	Key Pattern	Why It Matters	Company Tags
26	Design an Offline Evaluation Framework for a Ranking Model	Medium	30 min	NDCG, MAP, MRR computation	Offline evaluation gates production deployment	Google, Meta, LinkedIn
27	Detect and Mitigate Data Leakage	Medium	25 min	Temporal leakage, feature leakage	#1 cause of models that work offline but fail online	All
28	Compare Online vs. Offline Metrics Discrepancy	Hard	30 min	Distribution shift, delayed feedback	The classic MLE headache; tests production experience	Meta, Google, Uber
29	Design a Model Retraining Strategy	Medium	25 min	Trigger-based vs. scheduled, data freshness	Models decay; retraining strategy is critical	FAANG, Big Tech
30	Implement Calibration Analysis for a Binary Classifier	Medium	25 min	Reliability diagram, Platt scaling, isotonic regression	Calibrated probabilities are essential for decision-making	Meta, Google, Stripe

:::tip The "Why" Behind ML Depth Questions Interviewers are not testing whether you can recite textbook definitions. They want to hear: "In my experience, I chose X over Y because..." Signal practical experience, not academic knowledge. :::

Round 3: System Design Problems (15 Problems)

MLE system design focuses on end-to-end ML systems with both offline (training) and online (serving) components.

Recommendation & Ranking Systems

#	Problem	Difficulty	Time	Key Pattern	Why It Matters	Company Tags
31	Design a Product Recommendation System for E-Commerce	Medium	40 min	Collaborative filtering + content-based + hybrid	The canonical MLE system design problem	Amazon, Meta, Pinterest
32	Design a Search Ranking System	Hard	45 min	Query understanding + retrieval + learning-to-rank	Search is the backbone of many products	Google, Amazon, Airbnb
33	Design a Video Recommendation System	Medium	40 min	Multi-stage ranking + user session modeling	Video platforms have unique challenges (watch time, engagement)	YouTube/Google, Netflix, TikTok
34	Design an Ad Click Prediction System	Hard	45 min	Feature engineering + real-time prediction + calibration	Ads generate the majority of revenue at most tech companies	Google, Meta, Amazon
35	Design a People You May Know System	Medium	35 min	Graph-based features + ML ranking	Social graph + ML combination	LinkedIn, Meta, Twitter

Classification & Detection Systems

#	Problem	Difficulty	Time	Key Pattern	Why It Matters	Company Tags
36	Design a Spam Detection System	Medium	35 min	Text classification + feedback loops + adversarial robustness	Content safety is universal	Google, Meta, Microsoft
37	Design a Credit Risk Scoring System	Medium	35 min	Feature engineering + explainability + fairness	Regulated ML with fairness constraints	Stripe, Square, Goldman
38	Design an Anomaly Detection System for Cloud Infrastructure	Medium	35 min	Time-series models + alerting + root cause analysis	Infrastructure monitoring with ML	Google, Amazon, Datadog
39	Design a Document Classification Pipeline	Medium	30 min	NLP preprocessing + embeddings + fine-tuning	Document understanding is a core ML task	Google, Amazon, AI Labs
40	Design an Image Classification System at Scale	Medium	35 min	CNN training + transfer learning + serving optimization	Computer vision MLE fundamentals	Google, Apple, Tesla

ML Infrastructure Systems

#	Problem	Difficulty	Time	Key Pattern	Why It Matters	Company Tags
41	Design a Model Experimentation Platform	Hard	45 min	Experiment tracking, reproducibility, collaboration	The backbone of ML development	FAANG, Unicorns
42	Design a Feature Pipeline with Online and Offline Serving	Hard	45 min	Feature computation, consistency, freshness	Feature engineering at scale	Uber, Airbnb, Databricks
43	Design a Model A/B Testing Framework	Medium	35 min	Traffic splitting, metric computation, statistical tests	Every model change needs rigorous testing	FAANG, Big Tech
44	Design a Distributed Training System	Hard	45 min	Data parallelism, model parallelism, gradient aggregation	Training at scale is a core MLE responsibility	Google, Meta, AI Labs
45	Design a Multi-Model Serving Architecture	Medium	35 min	Model registry, canary deployment, fallback	Serving multiple models in production	FAANG, Unicorns

:::note System Design Deep Dive Areas For each system design problem, be prepared to deep dive into:

Data pipeline: How data flows from raw to features
Model architecture: Why this model for this problem
Training infrastructure: How training scales
Serving architecture: Latency, throughput, reliability
Monitoring: What can go wrong and how to detect it :::

4-Week MLE Study Plan

Week	Focus	Problems	Daily Load
Week 1	Coding fundamentals	#1-15 (Coding round)	2-3 problems/day
Week 2	ML depth	#16-30 (ML depth round)	2 problems/day + review coding
Week 3	System design	#31-45 (System design round)	1-2 designs/day + review ML
Week 4	Integration + mock	Mix of all rounds	1 problem + 1 mock interview/day

Week 1: Coding Deep Dive

Day 1: #1, #2 (heap merge, LRU cache)
Day 2: #3, #4 (serialization, trie)
Day 3: #5, #6 (scheduling, mini-batch GD)
Day 4: #7, #8 (KNN, TF-IDF)
Day 5: #9, #10 (AUC-ROC, stratified CV)
Day 6: #11, #12, #13 (streaming stats, reservoir sampling, logs)
Day 7: #14, #15 + review weak problems

Week 2: ML Depth Focus

Day 1: #16, #17 (GBT, backprop)
Day 2: #18, #19 (Word2Vec, RF vs GBT)
Day 3: #20, #21 (attention, large-scale training)
Day 4: #22, #23 (debugging, LR scheduling)
Day 5: #24, #25 (missing data, early stopping)
Day 6: #26, #27 (offline eval, data leakage)
Day 7: #28, #29, #30 (online/offline gap, retraining, calibration)

Week 3: System Design Sprint

Day 1: #31 (product recommendations)
Day 2: #32 (search ranking)
Day 3: #33 (video recommendations)
Day 4: #34, #35 (ad prediction, people you may know)
Day 5: #36, #37 (spam detection, credit scoring)
Day 6: #38, #39, #40 (anomaly detection, doc classification, image classification)
Day 7: #41, #42 (experimentation platform, feature pipeline)

Week 4: Polish and Mock

Day 1: #43, #44, #45 (A/B testing, distributed training, multi-model serving)
Day 2-3: Re-solve all Yellow/Red problems
Day 4-5: Full mock interviews (1 coding + 1 ML + 1 design)
Day 6-7: Final review of weak areas

Problem Deep Dives

Problem 17: Implement Backpropagation for a 2-Layer Neural Network

Why this problem matters: If you cannot implement backprop from scratch, interviewers will question whether you truly understand the models you build. This is the single most important MLE implementation problem.

Setup:

Network: Input(d) -> Hidden(h) -> Output(1)
Activation: ReLU for hidden, Sigmoid for output
Loss: Binary Cross-Entropy

Forward Pass:

z1 = X @ W1 + b1          # (n, h)
a1 = max(0, z1)            # ReLU
z2 = a1 @ W2 + b2          # (n, 1)
a2 = sigmoid(z2)           # Output probability
loss = -mean(y*log(a2) + (1-y)*log(1-a2))

Backward Pass:

dz2 = a2 - y               # (n, 1)
dW2 = (1/n) * a1.T @ dz2   # (h, 1)
db2 = (1/n) * sum(dz2)     # (1,)
da1 = dz2 @ W2.T           # (n, h)
dz1 = da1 * (z1 > 0)       # ReLU derivative
dW1 = (1/n) * X.T @ dz1    # (d, h)
db1 = (1/n) * sum(dz1)     # (h,)

Key Points Interviewers Check:

Correct application of chain rule
ReLU derivative is 0 where input <= 0
Matrix dimension alignment
Numerical stability in sigmoid and log

Problem 32: Design a Search Ranking System

Why this problem matters: Search ranking combines information retrieval, ML, and systems engineering -- the full MLE skill set.

Architecture:

Query -> Query Understanding -> Retrieval -> Ranking -> Reranking -> Results

1. Query Understanding
   - Spell correction, query expansion, intent classification
   - Embedding-based query representation

2. Retrieval (get ~1000 candidates)
   - Inverted index (BM25)
   - Embedding-based retrieval (ANN)
   - Combine with reciprocal rank fusion

3. Ranking (score ~1000 -> top 100)
   - Features: query-document relevance, freshness, authority, user history
   - Model: LambdaMART or neural ranker
   - Training data: clicks, dwell time (with position bias correction)

4. Reranking (top 100 -> final order)
   - Diversity injection
   - Personalization
   - Business rules (ads, promoted content)

5. Evaluation
   - Offline: NDCG@10, MAP, MRR
   - Online: CTR, query success rate, session length

MLE-Specific Patterns to Master

Pattern	Where It Appears	Problems
Batched computation	Training, inference, feature computation	#6, #21, #44
Online/offline consistency	Feature stores, model serving	#28, #42
Multi-stage pipeline	Recommendation, search, ads	#31, #32, #33, #34
Calibration	Any system using probabilities for decisions	#30, #34
Feedback loops	Spam, recommendation, content moderation	#36
Distribution shift	Production model degradation	#28, #29
Embedding-based retrieval	Search, recommendations, similarity	#18, #33
Experiment design	Any model deployment	#43

Difficulty Distribution

Difficulty	Problems	Count
Easy	#7, #13, #15, #25	4
Medium	#2, #5, #6, #8, #9, #10, #11, #12, #14, #19, #21, #22, #23, #24, #26, #27, #29, #30, #31, #33, #35, #36, #37, #38, #39, #40, #43, #45	28
Hard	#1, #3, #4, #16, #17, #18, #20, #28, #32, #34, #41, #42, #44	13

Next Steps

After completing the MLE problem list, consider:

AI Engineer Problems if your role includes LLM/GenAI work
Hard Tier if targeting Staff+ MLE roles
Google-Style Problems or Meta-Style Problems to calibrate for specific companies
Section 15: Role-Specific Prep for the full MLE preparation path

MLE Interview Structure​

Round 1: Coding Problems (15 Problems)​

Core DSA for MLEs​

ML-Flavored Coding​

Data Processing Coding​

Round 2: ML Depth Problems (15 Problems)​

Algorithm Understanding​

Training & Optimization​

Evaluation & Production​

Round 3: System Design Problems (15 Problems)​

Recommendation & Ranking Systems​

Classification & Detection Systems​

ML Infrastructure Systems​

4-Week MLE Study Plan​

Week 1: Coding Deep Dive​

Week 2: ML Depth Focus​

Week 3: System Design Sprint​

Week 4: Polish and Mock​

Problem Deep Dives​

Problem 17: Implement Backpropagation for a 2-Layer Neural Network​

Problem 32: Design a Search Ranking System​

MLE-Specific Patterns to Master​

Difficulty Distribution​

Next Steps​