Skip to main content

The Core 50

Reading time: ~35 min | Interview relevance: Critical | Roles: All AI/ML roles

A senior MLE at Google once told me: "I have conducted over 200 interviews, and I can tell within the first 10 minutes whether a candidate has done structured preparation or just randomly ground LeetCode." The difference is not raw problem count -- it is pattern coverage. The Core 50 is designed to give you maximum pattern coverage with minimum problem count.

These 50 problems are the absolute foundation. They span four categories -- Data Structures & Algorithms (20), ML Concepts (10), System Design (10), and NumPy/Pandas/SQL (10) -- and together they cover the patterns that appear in over 80% of AI/ML interview questions across all companies and levels.

How to Use This List

:::tip The Rule of Three For each problem: (1) solve it yourself, (2) study the optimal solution, (3) re-solve from scratch without looking. If you cannot do step 3, you have not learned the problem -- you have only memorized it. :::

Estimated Timeline

PaceDaily ProblemsTotal Time
Intensive3-4 per day~2 weeks
Moderate2 per day~4 weeks
Relaxed1 per day~7 weeks

Difficulty Distribution

DifficultyCountPercentage
Easy1428%
Medium2652%
Hard1020%

Part 1: Data Structures & Algorithms (20 Problems)

These 20 problems cover every major DSA pattern that appears in AI/ML coding interviews. Master these and you will recognize the pattern in nearly any coding question thrown at you.

Arrays & Hashing (4 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
1Two SumEasy10 minHash map lookupFoundation of complement-based problems; O(n) vs O(n^2) thinkingFAANG, All
2Best Time to Buy and Sell StockEasy10 minRunning minimumTeaches single-pass optimization; common in time-series feature engineeringFAANG, Big Tech
3Group AnagramsMedium20 minSorted key hashingString canonicalization; pattern appears in feature grouping and deduplicationGoogle, Meta, Amazon
4Product of Array Except SelfMedium20 minPrefix/suffix arraysNo-division constraint forces creative thinking; prefix sums appear in cumulative metricsGoogle, Apple, Microsoft

:::note Why Arrays & Hashing First? Over 60% of coding interview questions involve arrays or hash maps in some form. These four problems establish the mental framework for "can I solve this in O(n) instead of O(n^2)?" which is the single most important optimization insight. :::

Pattern Summary:

  • Hash map for O(1) lookup -- When you need to find complements, pairs, or groups
  • Prefix/suffix decomposition -- When you need cumulative information from both directions
  • Single-pass with running state -- When you can maintain a summary while scanning

Sliding Window & Two Pointers (3 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
5Container With Most WaterMedium20 minTwo pointers inwardGreedy narrowing; appears in optimization problemsGoogle, Meta, Amazon
6Longest Substring Without Repeating CharactersMedium25 minSliding window with setVariable-width window; relevant to text processing and tokenizationFAANG, AI Labs
7Minimum Window SubstringHard35 minSliding window with mapAdvanced window management; frequency counting under constraintsGoogle, Meta, Uber

Pattern Summary:

  • Two pointers -- Shrink search space by moving endpoints based on a condition
  • Sliding window -- Maintain a dynamic range that expands and contracts to meet constraints

Linked Lists (2 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
8Reverse Linked ListEasy10 minPointer reversalFundamental pointer manipulation; tests careful state managementFAANG, All
9Merge Two Sorted ListsEasy15 minMerge patternFoundation for merge sort; relevant to combining sorted data streamsFAANG, All

Pattern Summary:

  • In-place pointer manipulation -- Three-pointer technique (prev, curr, next)
  • Merge with sentinel -- Use a dummy head to simplify edge cases

Trees & Graphs (4 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
10Maximum Depth of Binary TreeEasy10 minRecursive DFSFoundation of tree recursion; base case thinkingFAANG, All
11Validate Binary Search TreeMedium20 minIn-order traversal with boundsRange constraint propagation; appears in decision tree validationGoogle, Meta, Microsoft
12Number of IslandsMedium20 minBFS/DFS flood fillConnected component finding; relevant to image segmentation and clusteringFAANG, Big Tech
13Course ScheduleMedium25 minTopological sort (cycle detection)Dependency resolution; critical for ML pipeline DAGs and training schedulesGoogle, Meta, Airbnb

:::warning Trees and Graphs Are Non-Negotiable ML pipelines are DAGs. Feature dependency graphs need topological sorting. Decision trees need traversal. If you skip tree/graph problems, you are leaving a major gap in your preparation. :::

Pattern Summary:

  • Recursive DFS -- Solve for a node assuming subtrees are solved
  • BFS level-order -- Process nodes level by level
  • Topological sort -- Order nodes respecting directed dependencies

Binary Search (2 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
14Binary SearchEasy10 minStandard binary searchExact foundation; get the off-by-one errors rightFAANG, All
15Search in Rotated Sorted ArrayMedium25 minModified binary searchHandling invariant breaks; appears in threshold search for ML modelsGoogle, Meta, Amazon

Pattern Summary:

  • Binary search on answer -- When the search space is monotonic, binary search on the result
  • Invariant identification -- Which half is sorted? Base decisions on the invariant

Dynamic Programming (3 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
16Climbing StairsEasy10 minBasic DP (Fibonacci variant)Introduces memoization and bottom-up thinkingFAANG, All
17Longest Common SubsequenceMedium25 min2D DP with string comparisonSequence alignment; directly relevant to edit distance in NLPGoogle, Microsoft, AI Labs
18Word BreakMedium30 minDP with dictionary lookupString segmentation; relevant to tokenization and text parsingGoogle, Meta, Amazon

Pattern Summary:

  • 1D DP -- State depends on previous 1-2 states
  • 2D DP -- State depends on two dimensions (e.g., two strings)
  • DP with auxiliary structure -- Combine DP with hash set or trie for efficient lookup

Heaps & Stacks (2 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
19Top K Frequent ElementsMedium20 minMin-heap / bucket sortTop-K pattern appears constantly in ML (top-K predictions, features)FAANG, Big Tech
20Valid ParenthesesEasy10 minStack for matchingParsing and validation; relevant to expression evaluation and syntax checkingFAANG, All

Pattern Summary:

  • Heap for Top-K -- Maintain a heap of size K for streaming top-K
  • Stack for nesting -- Track opening structures and match with closing ones

Part 2: ML Concepts (10 Problems)

These problems test your ability to implement ML algorithms, reason about models, and make practical ML decisions. They appear in ML-specific coding rounds and discussion rounds.

ML Implementation (4 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
21Implement Linear Regression from ScratchMedium30 minGradient descent, loss computationFoundation of optimization; tests NumPy fluencyFAANG, AI Labs
22Implement K-Means ClusteringMedium30 minIterative assignment + updateCore unsupervised algorithm; tests convergence reasoningGoogle, Meta, Startups
23Implement Logistic Regression with Gradient DescentMedium30 minSigmoid, cross-entropy, gradient updateClassification foundation; connects to neural network basicsFAANG, Big Tech
24Implement a Decision Tree (ID3/CART)Hard40 minRecursive splitting, information gainTree-based model understanding; leads to ensemble methodsGoogle, Microsoft, AI Labs

:::tip Implementation Problems Are About Understanding, Not Memorization Interviewers do not expect you to have memorized sklearn source code. They want to see that you understand what happens inside the black box. Focus on: (1) the loss function, (2) the optimization step, (3) convergence criteria, and (4) edge cases. :::

What interviewers look for:

  • Correct mathematical formulation (loss function, gradient)
  • Clean NumPy vectorization (no raw Python loops over data points)
  • Handling edge cases (empty clusters, zero variance features, convergence checks)
  • Ability to discuss tradeoffs (learning rate selection, initialization strategies)

ML Reasoning (3 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
25Bias-Variance Tradeoff AnalysisMedium25 minDecompose error into bias + variance + noiseFundamental model selection framework; every ML discussion touches thisFAANG, All
26Design a Cross-Validation StrategyMedium20 minK-fold, stratified, time-series splitEvaluation methodology; incorrect CV is the #1 source of leakageGoogle, Meta, Big Tech
27Feature Selection: Filter vs Wrapper vs EmbeddedMedium25 minCompare selection approachesPractical ML decision-making; impacts model performance and training speedBig Tech, Startups

What interviewers look for:

  • Clear articulation of when to use each approach
  • Understanding of failure modes (data leakage, distribution shift)
  • Practical experience signals (mentioning specific tools, real scenarios)

ML Applied (3 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
28Handle Class Imbalance in a Fraud Detection SystemMedium25 minResampling, loss weighting, threshold tuningNearly every real ML problem has imbalanceFAANG, Fintech, Big Tech
29Debug a Model with High Training Accuracy but Low Test AccuracyEasy20 minOverfitting diagnosisSystematic debugging is a core ML skillFAANG, All
30Choose Metrics for a Recommendation SystemMedium20 minPrecision@K, NDCG, coverage, diversityMetric selection drives model development decisionsMeta, Netflix, Spotify, Amazon

What interviewers look for:

  • Systematic thinking (not jumping to the first idea)
  • Awareness of real-world constraints (latency, cost, fairness)
  • Ability to connect metrics to business outcomes

Part 3: System Design (10 Problems)

These problems test your ability to design end-to-end ML systems. They are the highest-signal round for senior roles.

ML System Design (5 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
31Design a News Feed Ranking SystemMedium40 minMulti-stage ranking (retrieval + ranking + reranking)The canonical recommendation system problemMeta, Twitter, LinkedIn
32Design a Real-Time Fraud Detection SystemMedium40 minStreaming features + low-latency inferenceTests real-time ML architecture knowledgeStripe, PayPal, Amazon
33Design an Image Search SystemMedium35 minEmbedding generation + ANN index + servingCombines deep learning with systems thinkingGoogle, Pinterest, Airbnb
34Design an ML Model Monitoring SystemHard45 minDrift detection + alerting + automated retrainingTests MLOps maturity; increasingly asked at all levelsFAANG, Unicorns
35Design a Content Moderation SystemHard45 minMulti-modal classification + human-in-the-loopTrust and safety is a top priority at every platformMeta, Google, TikTok

:::note System Design Scoring Most companies evaluate system design on four axes: (1) Requirements gathering, (2) High-level architecture, (3) Deep dive into critical components, (4) Tradeoffs and scaling. Make sure your practice covers all four. :::

Design Framework:

  1. Clarify -- What is the product? Who are the users? What are the constraints (latency, throughput, cost)?
  2. Data -- What data is available? How is it collected? What are the labels?
  3. Model -- What model architecture? What features? What loss function?
  4. Serving -- How is the model deployed? What is the inference path?
  5. Monitoring -- How do you detect issues? What metrics do you track?
  6. Iteration -- How do you improve? A/B testing, online learning, retraining?

Infrastructure Design (5 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
36Design a Feature StoreHard45 minOnline/offline feature serving, consistencyCentral infrastructure for ML platformsUber, Airbnb, Stripe, Databricks
37Design an A/B Testing PlatformMedium35 minExperiment assignment, statistical analysis, guardrailsEvery ML improvement needs experimentationFAANG, Big Tech
38Design a Model Training PipelineMedium35 minData ingestion, training, validation, deploymentEnd-to-end ML lifecycle managementGoogle, Meta, Amazon
39Design a Low-Latency Model Serving SystemHard40 minModel optimization, caching, load balancingServing is where ML meets production realityFAANG, AI Labs
40Design a Data Labeling PipelineMedium30 minActive learning, quality control, human-in-the-loopData quality determines model qualityScale AI, Google, Amazon

Part 4: NumPy, Pandas & SQL (10 Problems)

These problems test your data manipulation fluency -- the practical skill that separates AI/ML engineers from pure software engineers.

NumPy (3 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
41Implement Matrix Multiplication (no np.dot)Easy15 minNested loops vs broadcastingTests understanding of vectorization fundamentalsGoogle, Meta, AI Labs
42Compute Cosine Similarity Matrix for N VectorsMedium20 minBroadcasting + normalizationCore operation in embeddings, retrieval, and recommendationsFAANG, AI Labs
43Implement Softmax with Numerical StabilityMedium15 minLog-sum-exp trickAppears in every neural network; numerical stability is criticalFAANG, AI Labs

:::warning NumPy Fluency Is Table Stakes If you cannot write vectorized NumPy code fluently, interviewers will question your ability to implement anything in ML. Practice until broadcasting and axis operations feel natural. :::

Key NumPy Patterns:

  • Broadcasting -- Extend dimensions to enable element-wise operations on different shapes
  • Axis operations -- np.sum(axis=0) vs np.sum(axis=1) -- know which is which instantly
  • Numerical stability -- Always subtract the max before exp, use log-sum-exp

Pandas (3 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
44Compute Rolling 7-Day Average Revenue per User CohortMedium20 minGroupBy + rolling windowTime-series feature engineering; business analyticsMeta, Airbnb, Uber
45Detect and Handle Missing Values in a Mixed-Type DatasetEasy15 minisnull, fillna, interpolation strategiesData cleaning is 80% of real ML workAll
46Pivot User Event Logs into Feature VectorsMedium25 minPivot table + aggregationFeature engineering from raw event dataBig Tech, Startups

Key Pandas Patterns:

  • GroupBy-Apply-Combine -- The fundamental pandas workflow
  • Merge/Join -- Left, inner, outer joins on DataFrames
  • Window functions -- Rolling, expanding, and exponential weighted operations

SQL (4 Problems)

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
47Find the Second Highest Salary in Each DepartmentMedium15 minWindow functions (ROW_NUMBER, RANK)Window functions appear in 70%+ of SQL interview questionsFAANG, All
48Calculate Month-over-Month Revenue Growth RateMedium20 minLAG/LEAD + date arithmeticTime-series analysis in SQL; business metric computationMeta, Google, Airbnb
49Find Users Who Were Active on 3+ Consecutive DaysHard25 minSelf-join or window function gap analysisTests advanced SQL reasoning; user engagement analysisMeta, Uber, LinkedIn
50Compute Retention Rates by CohortHard30 minCohort join + conditional aggregationThe canonical product analytics queryMeta, Airbnb, Spotify

Key SQL Patterns:

  • Window functions -- ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, running totals
  • Self-joins -- Comparing rows within the same table
  • CTEs -- Common Table Expressions for readable, modular queries
  • Date manipulation -- DATE_TRUNC, DATE_DIFF, interval arithmetic

Study Plans

2-Week Intensive Plan

DayProblemsCategoryFocus
1#1-4Arrays & HashingHash map patterns
2#5-7Sliding WindowWindow management
3#8-9, #14-15Linked Lists + Binary SearchPointer manipulation, search
4#10-13Trees & GraphsDFS, BFS, topological sort
5#16-18Dynamic ProgrammingMemoization, 2D DP
6#19-20Heaps & StacksTop-K, parsing
7Review DayAll DSARe-solve Yellow/Red problems
8#21-24ML ImplementationCode ML algorithms
9#25-27ML ReasoningConceptual discussions
10#28-30ML AppliedPractical ML decisions
11#31-35ML System DesignEnd-to-end design
12#36-40Infrastructure DesignPlatform-level thinking
13#41-46NumPy & PandasData manipulation
14#47-50 + ReviewSQL + Final ReviewQuery writing, gap analysis

4-Week Moderate Plan

WeekDaily LoadFocus
Week 12 problems/dayDSA problems (#1-14) with spaced review
Week 22 problems/dayDSA (#15-20) + ML Implementation (#21-24) + ML Reasoning (#25-27)
Week 32 problems/dayML Applied (#28-30) + System Design (#31-40)
Week 42 problems/dayNumPy/Pandas/SQL (#41-50) + comprehensive review

Progress Tracker

DSA Problems (20)

#ProblemStatusDateTimeNotes
1Two Sum[ ]
2Best Time to Buy and Sell Stock[ ]
3Group Anagrams[ ]
4Product of Array Except Self[ ]
5Container With Most Water[ ]
6Longest Substring Without Repeating[ ]
7Minimum Window Substring[ ]
8Reverse Linked List[ ]
9Merge Two Sorted Lists[ ]
10Maximum Depth of Binary Tree[ ]
11Validate Binary Search Tree[ ]
12Number of Islands[ ]
13Course Schedule[ ]
14Binary Search[ ]
15Search in Rotated Sorted Array[ ]
16Climbing Stairs[ ]
17Longest Common Subsequence[ ]
18Word Break[ ]
19Top K Frequent Elements[ ]
20Valid Parentheses[ ]

ML Problems (10)

#ProblemStatusDateTimeNotes
21Linear Regression from Scratch[ ]
22K-Means Clustering[ ]
23Logistic Regression with GD[ ]
24Decision Tree (ID3/CART)[ ]
25Bias-Variance Tradeoff[ ]
26Cross-Validation Strategy[ ]
27Feature Selection Methods[ ]
28Handle Class Imbalance[ ]
29Debug Overfitting Model[ ]
30Choose Recommendation Metrics[ ]

System Design Problems (10)

#ProblemStatusDateTimeNotes
31News Feed Ranking[ ]
32Fraud Detection System[ ]
33Image Search System[ ]
34ML Model Monitoring[ ]
35Content Moderation[ ]
36Feature Store[ ]
37A/B Testing Platform[ ]
38Model Training Pipeline[ ]
39Low-Latency Model Serving[ ]
40Data Labeling Pipeline[ ]

NumPy/Pandas/SQL Problems (10)

#ProblemStatusDateTimeNotes
41Matrix Multiplication[ ]
42Cosine Similarity Matrix[ ]
43Softmax (Numerically Stable)[ ]
44Rolling Average per Cohort[ ]
45Handle Missing Values[ ]
46Pivot Event Logs to Features[ ]
47Second Highest Salary per Dept[ ]
48Month-over-Month Growth[ ]
493+ Consecutive Active Days[ ]
50Cohort Retention Rates[ ]

Detailed Problem Guides

Problem 1: Two Sum

Category: Arrays & Hashing | Difficulty: Easy | Time: 10 min

Problem: Given an array of integers and a target, return indices of two numbers that sum to the target.

Brute Force: O(n^2) -- check every pair.

Optimal Approach:

1. Create an empty hash map
2. For each number at index i:
a. Compute complement = target - nums[i]
b. If complement exists in hash map, return [map[complement], i]
c. Otherwise, store nums[i] -> i in hash map
3. Time: O(n), Space: O(n)

Key Insight: Trading space for time via hash map is the single most important optimization technique in coding interviews.

Follow-up Questions:

  • What if the array is sorted? (Two pointers -- O(1) space)
  • What if there are multiple valid pairs? (Return all pairs)
  • What about duplicate values? (Handle carefully during insertion)

Problem 21: Linear Regression from Scratch

Category: ML Implementation | Difficulty: Medium | Time: 30 min

Problem: Implement linear regression with gradient descent using only NumPy.

Key Components:

1. Initialize weights w and bias b to zeros (or small random values)
2. Forward pass: y_pred = X @ w + b
3. Loss: MSE = (1/n) * sum((y_pred - y)^2)
4. Gradients:
dw = (2/n) * X.T @ (y_pred - y)
db = (2/n) * sum(y_pred - y)
5. Update: w -= lr * dw, b -= lr * db
6. Repeat for N iterations or until convergence

What interviewers look for:

  • Vectorized NumPy (no Python for-loops over data points)
  • Correct gradient derivation
  • Convergence check (loss decreasing)
  • Discussion of learning rate selection

Follow-up Questions:

  • How would you add L2 regularization? (Add lambda * w to gradient)
  • What happens with features at different scales? (Need standardization)
  • When would gradient descent fail? (Non-convex loss, bad learning rate)

Problem 31: News Feed Ranking System

Category: ML System Design | Difficulty: Medium | Time: 40 min

Problem: Design the ranking system that determines what posts appear in a social media news feed.

High-Level Architecture:

1. Candidate Generation (retrieve ~1000 candidates)
- Friends' posts, group posts, followed pages
- Collaborative filtering for content discovery

2. Feature Engineering
- User features: age, interests, engagement history
- Post features: type, age, author engagement rate
- Cross features: user-author affinity, topic relevance

3. Ranking Model
- Multi-task learning: predict P(like), P(comment), P(share), P(hide)
- Combine into a single score with business-weighted formula

4. Post-Processing
- Diversity injection (avoid all posts from same author)
- Freshness boost
- Policy filtering (content moderation)

5. Serving
- Pre-compute embeddings, real-time feature assembly
- Cache ranked lists with TTL

6. Evaluation
- Online: engagement metrics, time spent, user retention
- Offline: NDCG, AUC, calibration

Common Mistakes on Core 50 Problems

:::danger Mistakes That Cost Offers

  1. Solving Two Sum with nested loops -- Instantly signals you have not prepared
  2. Implementing ML algorithms with Python loops -- NumPy vectorization is expected
  3. System design without clarifying requirements -- Jumping to solutions is a red flag
  4. SQL without window functions -- If you only know GROUP BY, your SQL is incomplete
  5. Not discussing time/space complexity -- Always state it, even if not asked :::

Pattern Cheat Sheet

PatternProblems That Use ItRecognition Signal
Hash map complement#1, #3, #6"Find pair/group with property X"
Sliding window#6, #7"Substring/subarray with constraint"
Two pointers#5, #8, #9"Sorted array" or "shrink search space"
BFS/DFS#10, #11, #12"Tree/graph traversal" or "connected components"
Topological sort#13"Dependencies" or "ordering with prerequisites"
Binary search#14, #15"Sorted" or "monotonic" or "minimize maximum"
Dynamic programming#16, #17, #18"Optimal" or "count ways" or "min/max"
Heap / priority queue#19"Top K" or "K-th largest/smallest"
Stack#20"Matching" or "nesting" or "nearest greater"
Gradient descent#21, #23"Implement from scratch" or "optimize"
Window functions (SQL)#47, #48, #49, #50"Per group ranking" or "consecutive" or "running total"

Next Steps

Once you have completed the Core 50, move to your role-specific problem list:

Or, if you want to continue building breadth by difficulty, try the Easy Tier next.

© 2026 EngineersOfAI. All rights reserved.