Skip to main content

MLE Problem List

Reading time: ~40 min | Interview relevance: Critical | Roles: Machine Learning Engineer, Applied ML Engineer, ML Platform Engineer

You are applying for a Machine Learning Engineer position at a top tech company. The recruiter tells you there will be four rounds: coding, ML depth, system design, and behavioral. You have three weeks. What exactly should you practice?

This list of 45 problems is your answer. It is organized by interview round type, calibrated to the specific questions MLE candidates face, and sequenced to build skills in the right order. Every problem here has been reported in MLE interviews at major tech companies within the past two years.

MLE Interview Structure

Before diving into problems, understand what MLE interviews look like:

RoundDurationWhat They TestWeight
Coding45-60 minDSA + ML implementation25-30%
ML Depth45-60 minAlgorithm knowledge, training, evaluation25-30%
System Design45-60 minEnd-to-end ML systems25-30%
Behavioral30-45 minCollaboration, impact, conflict resolution10-20%

:::tip The MLE Sweet Spot MLEs sit at the intersection of software engineering and machine learning. Interviewers expect you to code as well as a SWE AND reason about ML as well as a data scientist. Neither skill alone is sufficient. :::

Round 1: Coding Problems (15 Problems)

MLE coding rounds combine standard DSA with ML-flavored algorithmic problems. You need both.

Core DSA for MLEs

#ProblemDifficultyTimeKey PatternWhy MLEs Need ItCompany Tags
1Merge K Sorted ListsHard30 minMin-heap mergeMerging sorted prediction streams; distributed training aggregationGoogle, Meta, Amazon
2LRU CacheMedium25 minHash map + doubly-linked listFeature caching, model result caching in serving systemsFAANG, All
3Serialize and Deserialize Binary TreeHard30 minBFS/DFS serializationModel serialization; decision tree persistenceGoogle, Meta, Microsoft
4Word Search II (Trie)Hard35 minTrie + backtrackingText search, autocomplete, vocabulary buildingGoogle, Amazon
5Meeting Rooms IIMedium20 minSweep line / min-heapResource scheduling; GPU allocation; training job schedulingGoogle, Meta, Uber

ML-Flavored Coding

#ProblemDifficultyTimeKey PatternWhy MLEs Need ItCompany Tags
6Implement Batch Gradient Descent with Mini-BatchesMedium25 minData batching + gradient accumulationCore training loop; tests understanding of batch vs. stochasticFAANG, AI Labs
7Implement K-Nearest NeighborsEasy20 minDistance computation + top-K selectionBaseline algorithm; tests NumPy fluencyGoogle, Meta, Startups
8Implement TF-IDF from ScratchMedium25 minTerm frequency + inverse document frequencyText feature engineering; information retrieval basicsGoogle, Amazon, AI Labs
9Implement AUC-ROC ComputationMedium20 minSorting + threshold sweepModel evaluation; understanding ranking metricsMeta, Uber, Airbnb
10Implement Stratified K-Fold Cross-ValidationMedium20 minBalanced splittingEvaluation strategy for imbalanced datasetsGoogle, Meta, Big Tech

Data Processing Coding

#ProblemDifficultyTimeKey PatternWhy MLEs Need ItCompany Tags
11Implement a Streaming Mean and Variance CalculatorMedium20 minWelford's algorithmOnline feature normalization; streaming statisticsGoogle, Uber, Databricks
12Implement Reservoir SamplingMedium20 minProbabilistic samplingSampling from data streams; training data selectionGoogle, Meta, Big Tech
13Parse and Aggregate Large Log FilesEasy15 minHash map aggregationData pipeline debugging; metric computationAll
14Implement Leaky Bucket Rate LimiterMedium20 minQueue-based rate limitingAPI rate limiting for model serving endpointsUber, Airbnb, Stripe
15Compute Intersection of Two Large Sorted ArraysEasy15 minTwo-pointer mergeFeature set intersection; dataset alignmentGoogle, Meta

:::warning Coding Round Red Flags for MLEs

  • Cannot implement basic ML algorithms without sklearn
  • Writes Python loops where NumPy vectorization is expected
  • Cannot discuss time complexity of ML operations (e.g., KNN is O(n*d) per query)
  • Ignores edge cases like empty datasets, single-class data, or NaN values :::

Round 2: ML Depth Problems (15 Problems)

These problems test your deep understanding of ML algorithms, training procedures, and evaluation strategies.

Algorithm Understanding

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
16Explain and Implement Gradient Boosted TreesHard35 minSequential ensemble with residual fittingGBTs dominate tabular ML; must understand deeplyGoogle, Meta, Amazon
17Implement Backpropagation for a 2-Layer Neural NetworkHard40 minChain rule, gradient computationTests true understanding of neural networks vs. framework usageFAANG, AI Labs
18Implement Word2Vec (Skip-gram with Negative Sampling)Hard40 minEmbedding learning, contrastive lossFoundation of all embedding-based systemsGoogle, Meta, Airbnb
19Explain Random Forest vs. Gradient Boosting: When to Use EachMedium20 minEnsemble comparisonPractical model selection for tabular dataAll
20Implement Attention Mechanism from ScratchHard35 minScaled dot-product attentionFoundation of Transformer architectureAI Labs, Google, Meta

Training & Optimization

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
21Design a Training Pipeline for a Model with 100M ExamplesMedium30 minData loading, batching, distributed trainingScale is the MLE differentiatorGoogle, Meta, Amazon
22Debug a Model That Is Not ConvergingMedium25 minSystematic debugging: data, model, optimizationReal-world debugging is 50% of MLE workFAANG, All
23Explain and Implement Learning Rate SchedulingMedium20 minStep decay, cosine annealing, warm-upTraining stability and convergence speedGoogle, Meta, AI Labs
24Handle Missing Data: Compare Imputation StrategiesMedium20 minMean, median, KNN, MICE, model-basedData quality directly impacts model qualityAll
25Implement Early Stopping with PatienceEasy15 minValidation loss monitoringPrevents overfitting; practical training techniqueAll

Evaluation & Production

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
26Design an Offline Evaluation Framework for a Ranking ModelMedium30 minNDCG, MAP, MRR computationOffline evaluation gates production deploymentGoogle, Meta, LinkedIn
27Detect and Mitigate Data LeakageMedium25 minTemporal leakage, feature leakage#1 cause of models that work offline but fail onlineAll
28Compare Online vs. Offline Metrics DiscrepancyHard30 minDistribution shift, delayed feedbackThe classic MLE headache; tests production experienceMeta, Google, Uber
29Design a Model Retraining StrategyMedium25 minTrigger-based vs. scheduled, data freshnessModels decay; retraining strategy is criticalFAANG, Big Tech
30Implement Calibration Analysis for a Binary ClassifierMedium25 minReliability diagram, Platt scaling, isotonic regressionCalibrated probabilities are essential for decision-makingMeta, Google, Stripe

:::tip The "Why" Behind ML Depth Questions Interviewers are not testing whether you can recite textbook definitions. They want to hear: "In my experience, I chose X over Y because..." Signal practical experience, not academic knowledge. :::

Round 3: System Design Problems (15 Problems)

MLE system design focuses on end-to-end ML systems with both offline (training) and online (serving) components.

Recommendation & Ranking Systems

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
31Design a Product Recommendation System for E-CommerceMedium40 minCollaborative filtering + content-based + hybridThe canonical MLE system design problemAmazon, Meta, Pinterest
32Design a Search Ranking SystemHard45 minQuery understanding + retrieval + learning-to-rankSearch is the backbone of many productsGoogle, Amazon, Airbnb
33Design a Video Recommendation SystemMedium40 minMulti-stage ranking + user session modelingVideo platforms have unique challenges (watch time, engagement)YouTube/Google, Netflix, TikTok
34Design an Ad Click Prediction SystemHard45 minFeature engineering + real-time prediction + calibrationAds generate the majority of revenue at most tech companiesGoogle, Meta, Amazon
35Design a People You May Know SystemMedium35 minGraph-based features + ML rankingSocial graph + ML combinationLinkedIn, Meta, Twitter

Classification & Detection Systems

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
36Design a Spam Detection SystemMedium35 minText classification + feedback loops + adversarial robustnessContent safety is universalGoogle, Meta, Microsoft
37Design a Credit Risk Scoring SystemMedium35 minFeature engineering + explainability + fairnessRegulated ML with fairness constraintsStripe, Square, Goldman
38Design an Anomaly Detection System for Cloud InfrastructureMedium35 minTime-series models + alerting + root cause analysisInfrastructure monitoring with MLGoogle, Amazon, Datadog
39Design a Document Classification PipelineMedium30 minNLP preprocessing + embeddings + fine-tuningDocument understanding is a core ML taskGoogle, Amazon, AI Labs
40Design an Image Classification System at ScaleMedium35 minCNN training + transfer learning + serving optimizationComputer vision MLE fundamentalsGoogle, Apple, Tesla

ML Infrastructure Systems

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
41Design a Model Experimentation PlatformHard45 minExperiment tracking, reproducibility, collaborationThe backbone of ML developmentFAANG, Unicorns
42Design a Feature Pipeline with Online and Offline ServingHard45 minFeature computation, consistency, freshnessFeature engineering at scaleUber, Airbnb, Databricks
43Design a Model A/B Testing FrameworkMedium35 minTraffic splitting, metric computation, statistical testsEvery model change needs rigorous testingFAANG, Big Tech
44Design a Distributed Training SystemHard45 minData parallelism, model parallelism, gradient aggregationTraining at scale is a core MLE responsibilityGoogle, Meta, AI Labs
45Design a Multi-Model Serving ArchitectureMedium35 minModel registry, canary deployment, fallbackServing multiple models in productionFAANG, Unicorns

:::note System Design Deep Dive Areas For each system design problem, be prepared to deep dive into:

  • Data pipeline: How data flows from raw to features
  • Model architecture: Why this model for this problem
  • Training infrastructure: How training scales
  • Serving architecture: Latency, throughput, reliability
  • Monitoring: What can go wrong and how to detect it :::

4-Week MLE Study Plan

WeekFocusProblemsDaily Load
Week 1Coding fundamentals#1-15 (Coding round)2-3 problems/day
Week 2ML depth#16-30 (ML depth round)2 problems/day + review coding
Week 3System design#31-45 (System design round)1-2 designs/day + review ML
Week 4Integration + mockMix of all rounds1 problem + 1 mock interview/day

Week 1: Coding Deep Dive

Day 1: #1, #2 (heap merge, LRU cache)
Day 2: #3, #4 (serialization, trie)
Day 3: #5, #6 (scheduling, mini-batch GD)
Day 4: #7, #8 (KNN, TF-IDF)
Day 5: #9, #10 (AUC-ROC, stratified CV)
Day 6: #11, #12, #13 (streaming stats, reservoir sampling, logs)
Day 7: #14, #15 + review weak problems

Week 2: ML Depth Focus

Day 1: #16, #17 (GBT, backprop)
Day 2: #18, #19 (Word2Vec, RF vs GBT)
Day 3: #20, #21 (attention, large-scale training)
Day 4: #22, #23 (debugging, LR scheduling)
Day 5: #24, #25 (missing data, early stopping)
Day 6: #26, #27 (offline eval, data leakage)
Day 7: #28, #29, #30 (online/offline gap, retraining, calibration)

Week 3: System Design Sprint

Day 1: #31 (product recommendations)
Day 2: #32 (search ranking)
Day 3: #33 (video recommendations)
Day 4: #34, #35 (ad prediction, people you may know)
Day 5: #36, #37 (spam detection, credit scoring)
Day 6: #38, #39, #40 (anomaly detection, doc classification, image classification)
Day 7: #41, #42 (experimentation platform, feature pipeline)

Week 4: Polish and Mock

Day 1: #43, #44, #45 (A/B testing, distributed training, multi-model serving)
Day 2-3: Re-solve all Yellow/Red problems
Day 4-5: Full mock interviews (1 coding + 1 ML + 1 design)
Day 6-7: Final review of weak areas

Problem Deep Dives

Problem 17: Implement Backpropagation for a 2-Layer Neural Network

Why this problem matters: If you cannot implement backprop from scratch, interviewers will question whether you truly understand the models you build. This is the single most important MLE implementation problem.

Setup:

Network: Input(d) -> Hidden(h) -> Output(1)
Activation: ReLU for hidden, Sigmoid for output
Loss: Binary Cross-Entropy

Forward Pass:

z1 = X @ W1 + b1 # (n, h)
a1 = max(0, z1) # ReLU
z2 = a1 @ W2 + b2 # (n, 1)
a2 = sigmoid(z2) # Output probability
loss = -mean(y*log(a2) + (1-y)*log(1-a2))

Backward Pass:

dz2 = a2 - y # (n, 1)
dW2 = (1/n) * a1.T @ dz2 # (h, 1)
db2 = (1/n) * sum(dz2) # (1,)
da1 = dz2 @ W2.T # (n, h)
dz1 = da1 * (z1 > 0) # ReLU derivative
dW1 = (1/n) * X.T @ dz1 # (d, h)
db1 = (1/n) * sum(dz1) # (h,)

Key Points Interviewers Check:

  • Correct application of chain rule
  • ReLU derivative is 0 where input <= 0
  • Matrix dimension alignment
  • Numerical stability in sigmoid and log

Problem 32: Design a Search Ranking System

Why this problem matters: Search ranking combines information retrieval, ML, and systems engineering -- the full MLE skill set.

Architecture:

Query -> Query Understanding -> Retrieval -> Ranking -> Reranking -> Results

1. Query Understanding
- Spell correction, query expansion, intent classification
- Embedding-based query representation

2. Retrieval (get ~1000 candidates)
- Inverted index (BM25)
- Embedding-based retrieval (ANN)
- Combine with reciprocal rank fusion

3. Ranking (score ~1000 -> top 100)
- Features: query-document relevance, freshness, authority, user history
- Model: LambdaMART or neural ranker
- Training data: clicks, dwell time (with position bias correction)

4. Reranking (top 100 -> final order)
- Diversity injection
- Personalization
- Business rules (ads, promoted content)

5. Evaluation
- Offline: NDCG@10, MAP, MRR
- Online: CTR, query success rate, session length

MLE-Specific Patterns to Master

PatternWhere It AppearsProblems
Batched computationTraining, inference, feature computation#6, #21, #44
Online/offline consistencyFeature stores, model serving#28, #42
Multi-stage pipelineRecommendation, search, ads#31, #32, #33, #34
CalibrationAny system using probabilities for decisions#30, #34
Feedback loopsSpam, recommendation, content moderation#36
Distribution shiftProduction model degradation#28, #29
Embedding-based retrievalSearch, recommendations, similarity#18, #33
Experiment designAny model deployment#43

Difficulty Distribution

DifficultyProblemsCount
Easy#7, #13, #15, #254
Medium#2, #5, #6, #8, #9, #10, #11, #12, #14, #19, #21, #22, #23, #24, #26, #27, #29, #30, #31, #33, #35, #36, #37, #38, #39, #40, #43, #4528
Hard#1, #3, #4, #16, #17, #18, #20, #28, #32, #34, #41, #42, #4413

Next Steps

After completing the MLE problem list, consider:

© 2026 EngineersOfAI. All rights reserved.