The Core 50
Reading time: ~35 min | Interview relevance: Critical | Roles: All AI/ML roles
A senior MLE at Google once told me: "I have conducted over 200 interviews, and I can tell within the first 10 minutes whether a candidate has done structured preparation or just randomly ground LeetCode." The difference is not raw problem count -- it is pattern coverage. The Core 50 is designed to give you maximum pattern coverage with minimum problem count.
These 50 problems are the absolute foundation. They span four categories -- Data Structures & Algorithms (20), ML Concepts (10), System Design (10), and NumPy/Pandas/SQL (10) -- and together they cover the patterns that appear in over 80% of AI/ML interview questions across all companies and levels.
How to Use This List
:::tip The Rule of Three For each problem: (1) solve it yourself, (2) study the optimal solution, (3) re-solve from scratch without looking. If you cannot do step 3, you have not learned the problem -- you have only memorized it. :::
Estimated Timeline
| Pace | Daily Problems | Total Time |
|---|---|---|
| Intensive | 3-4 per day | ~2 weeks |
| Moderate | 2 per day | ~4 weeks |
| Relaxed | 1 per day | ~7 weeks |
Difficulty Distribution
| Difficulty | Count | Percentage |
|---|---|---|
| Easy | 14 | 28% |
| Medium | 26 | 52% |
| Hard | 10 | 20% |
Part 1: Data Structures & Algorithms (20 Problems)
These 20 problems cover every major DSA pattern that appears in AI/ML coding interviews. Master these and you will recognize the pattern in nearly any coding question thrown at you.
Arrays & Hashing (4 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 1 | Two Sum | Easy | 10 min | Hash map lookup | Foundation of complement-based problems; O(n) vs O(n^2) thinking | FAANG, All |
| 2 | Best Time to Buy and Sell Stock | Easy | 10 min | Running minimum | Teaches single-pass optimization; common in time-series feature engineering | FAANG, Big Tech |
| 3 | Group Anagrams | Medium | 20 min | Sorted key hashing | String canonicalization; pattern appears in feature grouping and deduplication | Google, Meta, Amazon |
| 4 | Product of Array Except Self | Medium | 20 min | Prefix/suffix arrays | No-division constraint forces creative thinking; prefix sums appear in cumulative metrics | Google, Apple, Microsoft |
:::note Why Arrays & Hashing First? Over 60% of coding interview questions involve arrays or hash maps in some form. These four problems establish the mental framework for "can I solve this in O(n) instead of O(n^2)?" which is the single most important optimization insight. :::
Pattern Summary:
- Hash map for O(1) lookup -- When you need to find complements, pairs, or groups
- Prefix/suffix decomposition -- When you need cumulative information from both directions
- Single-pass with running state -- When you can maintain a summary while scanning
Sliding Window & Two Pointers (3 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 5 | Container With Most Water | Medium | 20 min | Two pointers inward | Greedy narrowing; appears in optimization problems | Google, Meta, Amazon |
| 6 | Longest Substring Without Repeating Characters | Medium | 25 min | Sliding window with set | Variable-width window; relevant to text processing and tokenization | FAANG, AI Labs |
| 7 | Minimum Window Substring | Hard | 35 min | Sliding window with map | Advanced window management; frequency counting under constraints | Google, Meta, Uber |
Pattern Summary:
- Two pointers -- Shrink search space by moving endpoints based on a condition
- Sliding window -- Maintain a dynamic range that expands and contracts to meet constraints
Linked Lists (2 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 8 | Reverse Linked List | Easy | 10 min | Pointer reversal | Fundamental pointer manipulation; tests careful state management | FAANG, All |
| 9 | Merge Two Sorted Lists | Easy | 15 min | Merge pattern | Foundation for merge sort; relevant to combining sorted data streams | FAANG, All |
Pattern Summary:
- In-place pointer manipulation -- Three-pointer technique (prev, curr, next)
- Merge with sentinel -- Use a dummy head to simplify edge cases
Trees & Graphs (4 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 10 | Maximum Depth of Binary Tree | Easy | 10 min | Recursive DFS | Foundation of tree recursion; base case thinking | FAANG, All |
| 11 | Validate Binary Search Tree | Medium | 20 min | In-order traversal with bounds | Range constraint propagation; appears in decision tree validation | Google, Meta, Microsoft |
| 12 | Number of Islands | Medium | 20 min | BFS/DFS flood fill | Connected component finding; relevant to image segmentation and clustering | FAANG, Big Tech |
| 13 | Course Schedule | Medium | 25 min | Topological sort (cycle detection) | Dependency resolution; critical for ML pipeline DAGs and training schedules | Google, Meta, Airbnb |
:::warning Trees and Graphs Are Non-Negotiable ML pipelines are DAGs. Feature dependency graphs need topological sorting. Decision trees need traversal. If you skip tree/graph problems, you are leaving a major gap in your preparation. :::
Pattern Summary:
- Recursive DFS -- Solve for a node assuming subtrees are solved
- BFS level-order -- Process nodes level by level
- Topological sort -- Order nodes respecting directed dependencies
Binary Search (2 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 14 | Binary Search | Easy | 10 min | Standard binary search | Exact foundation; get the off-by-one errors right | FAANG, All |
| 15 | Search in Rotated Sorted Array | Medium | 25 min | Modified binary search | Handling invariant breaks; appears in threshold search for ML models | Google, Meta, Amazon |
Pattern Summary:
- Binary search on answer -- When the search space is monotonic, binary search on the result
- Invariant identification -- Which half is sorted? Base decisions on the invariant
Dynamic Programming (3 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 16 | Climbing Stairs | Easy | 10 min | Basic DP (Fibonacci variant) | Introduces memoization and bottom-up thinking | FAANG, All |
| 17 | Longest Common Subsequence | Medium | 25 min | 2D DP with string comparison | Sequence alignment; directly relevant to edit distance in NLP | Google, Microsoft, AI Labs |
| 18 | Word Break | Medium | 30 min | DP with dictionary lookup | String segmentation; relevant to tokenization and text parsing | Google, Meta, Amazon |
Pattern Summary:
- 1D DP -- State depends on previous 1-2 states
- 2D DP -- State depends on two dimensions (e.g., two strings)
- DP with auxiliary structure -- Combine DP with hash set or trie for efficient lookup
Heaps & Stacks (2 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 19 | Top K Frequent Elements | Medium | 20 min | Min-heap / bucket sort | Top-K pattern appears constantly in ML (top-K predictions, features) | FAANG, Big Tech |
| 20 | Valid Parentheses | Easy | 10 min | Stack for matching | Parsing and validation; relevant to expression evaluation and syntax checking | FAANG, All |
Pattern Summary:
- Heap for Top-K -- Maintain a heap of size K for streaming top-K
- Stack for nesting -- Track opening structures and match with closing ones
Part 2: ML Concepts (10 Problems)
These problems test your ability to implement ML algorithms, reason about models, and make practical ML decisions. They appear in ML-specific coding rounds and discussion rounds.
ML Implementation (4 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 21 | Implement Linear Regression from Scratch | Medium | 30 min | Gradient descent, loss computation | Foundation of optimization; tests NumPy fluency | FAANG, AI Labs |
| 22 | Implement K-Means Clustering | Medium | 30 min | Iterative assignment + update | Core unsupervised algorithm; tests convergence reasoning | Google, Meta, Startups |
| 23 | Implement Logistic Regression with Gradient Descent | Medium | 30 min | Sigmoid, cross-entropy, gradient update | Classification foundation; connects to neural network basics | FAANG, Big Tech |
| 24 | Implement a Decision Tree (ID3/CART) | Hard | 40 min | Recursive splitting, information gain | Tree-based model understanding; leads to ensemble methods | Google, Microsoft, AI Labs |
:::tip Implementation Problems Are About Understanding, Not Memorization Interviewers do not expect you to have memorized sklearn source code. They want to see that you understand what happens inside the black box. Focus on: (1) the loss function, (2) the optimization step, (3) convergence criteria, and (4) edge cases. :::
What interviewers look for:
- Correct mathematical formulation (loss function, gradient)
- Clean NumPy vectorization (no raw Python loops over data points)
- Handling edge cases (empty clusters, zero variance features, convergence checks)
- Ability to discuss tradeoffs (learning rate selection, initialization strategies)
ML Reasoning (3 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 25 | Bias-Variance Tradeoff Analysis | Medium | 25 min | Decompose error into bias + variance + noise | Fundamental model selection framework; every ML discussion touches this | FAANG, All |
| 26 | Design a Cross-Validation Strategy | Medium | 20 min | K-fold, stratified, time-series split | Evaluation methodology; incorrect CV is the #1 source of leakage | Google, Meta, Big Tech |
| 27 | Feature Selection: Filter vs Wrapper vs Embedded | Medium | 25 min | Compare selection approaches | Practical ML decision-making; impacts model performance and training speed | Big Tech, Startups |
What interviewers look for:
- Clear articulation of when to use each approach
- Understanding of failure modes (data leakage, distribution shift)
- Practical experience signals (mentioning specific tools, real scenarios)
ML Applied (3 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 28 | Handle Class Imbalance in a Fraud Detection System | Medium | 25 min | Resampling, loss weighting, threshold tuning | Nearly every real ML problem has imbalance | FAANG, Fintech, Big Tech |
| 29 | Debug a Model with High Training Accuracy but Low Test Accuracy | Easy | 20 min | Overfitting diagnosis | Systematic debugging is a core ML skill | FAANG, All |
| 30 | Choose Metrics for a Recommendation System | Medium | 20 min | Precision@K, NDCG, coverage, diversity | Metric selection drives model development decisions | Meta, Netflix, Spotify, Amazon |
What interviewers look for:
- Systematic thinking (not jumping to the first idea)
- Awareness of real-world constraints (latency, cost, fairness)
- Ability to connect metrics to business outcomes
Part 3: System Design (10 Problems)
These problems test your ability to design end-to-end ML systems. They are the highest-signal round for senior roles.
ML System Design (5 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 31 | Design a News Feed Ranking System | Medium | 40 min | Multi-stage ranking (retrieval + ranking + reranking) | The canonical recommendation system problem | Meta, Twitter, LinkedIn |
| 32 | Design a Real-Time Fraud Detection System | Medium | 40 min | Streaming features + low-latency inference | Tests real-time ML architecture knowledge | Stripe, PayPal, Amazon |
| 33 | Design an Image Search System | Medium | 35 min | Embedding generation + ANN index + serving | Combines deep learning with systems thinking | Google, Pinterest, Airbnb |
| 34 | Design an ML Model Monitoring System | Hard | 45 min | Drift detection + alerting + automated retraining | Tests MLOps maturity; increasingly asked at all levels | FAANG, Unicorns |
| 35 | Design a Content Moderation System | Hard | 45 min | Multi-modal classification + human-in-the-loop | Trust and safety is a top priority at every platform | Meta, Google, TikTok |
:::note System Design Scoring Most companies evaluate system design on four axes: (1) Requirements gathering, (2) High-level architecture, (3) Deep dive into critical components, (4) Tradeoffs and scaling. Make sure your practice covers all four. :::
Design Framework:
- Clarify -- What is the product? Who are the users? What are the constraints (latency, throughput, cost)?
- Data -- What data is available? How is it collected? What are the labels?
- Model -- What model architecture? What features? What loss function?
- Serving -- How is the model deployed? What is the inference path?
- Monitoring -- How do you detect issues? What metrics do you track?
- Iteration -- How do you improve? A/B testing, online learning, retraining?
Infrastructure Design (5 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 36 | Design a Feature Store | Hard | 45 min | Online/offline feature serving, consistency | Central infrastructure for ML platforms | Uber, Airbnb, Stripe, Databricks |
| 37 | Design an A/B Testing Platform | Medium | 35 min | Experiment assignment, statistical analysis, guardrails | Every ML improvement needs experimentation | FAANG, Big Tech |
| 38 | Design a Model Training Pipeline | Medium | 35 min | Data ingestion, training, validation, deployment | End-to-end ML lifecycle management | Google, Meta, Amazon |
| 39 | Design a Low-Latency Model Serving System | Hard | 40 min | Model optimization, caching, load balancing | Serving is where ML meets production reality | FAANG, AI Labs |
| 40 | Design a Data Labeling Pipeline | Medium | 30 min | Active learning, quality control, human-in-the-loop | Data quality determines model quality | Scale AI, Google, Amazon |
Part 4: NumPy, Pandas & SQL (10 Problems)
These problems test your data manipulation fluency -- the practical skill that separates AI/ML engineers from pure software engineers.
NumPy (3 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 41 | Implement Matrix Multiplication (no np.dot) | Easy | 15 min | Nested loops vs broadcasting | Tests understanding of vectorization fundamentals | Google, Meta, AI Labs |
| 42 | Compute Cosine Similarity Matrix for N Vectors | Medium | 20 min | Broadcasting + normalization | Core operation in embeddings, retrieval, and recommendations | FAANG, AI Labs |
| 43 | Implement Softmax with Numerical Stability | Medium | 15 min | Log-sum-exp trick | Appears in every neural network; numerical stability is critical | FAANG, AI Labs |
:::warning NumPy Fluency Is Table Stakes If you cannot write vectorized NumPy code fluently, interviewers will question your ability to implement anything in ML. Practice until broadcasting and axis operations feel natural. :::
Key NumPy Patterns:
- Broadcasting -- Extend dimensions to enable element-wise operations on different shapes
- Axis operations --
np.sum(axis=0)vsnp.sum(axis=1)-- know which is which instantly - Numerical stability -- Always subtract the max before exp, use log-sum-exp
Pandas (3 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 44 | Compute Rolling 7-Day Average Revenue per User Cohort | Medium | 20 min | GroupBy + rolling window | Time-series feature engineering; business analytics | Meta, Airbnb, Uber |
| 45 | Detect and Handle Missing Values in a Mixed-Type Dataset | Easy | 15 min | isnull, fillna, interpolation strategies | Data cleaning is 80% of real ML work | All |
| 46 | Pivot User Event Logs into Feature Vectors | Medium | 25 min | Pivot table + aggregation | Feature engineering from raw event data | Big Tech, Startups |
Key Pandas Patterns:
- GroupBy-Apply-Combine -- The fundamental pandas workflow
- Merge/Join -- Left, inner, outer joins on DataFrames
- Window functions -- Rolling, expanding, and exponential weighted operations
SQL (4 Problems)
| # | Problem | Difficulty | Time | Key Pattern | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 47 | Find the Second Highest Salary in Each Department | Medium | 15 min | Window functions (ROW_NUMBER, RANK) | Window functions appear in 70%+ of SQL interview questions | FAANG, All |
| 48 | Calculate Month-over-Month Revenue Growth Rate | Medium | 20 min | LAG/LEAD + date arithmetic | Time-series analysis in SQL; business metric computation | Meta, Google, Airbnb |
| 49 | Find Users Who Were Active on 3+ Consecutive Days | Hard | 25 min | Self-join or window function gap analysis | Tests advanced SQL reasoning; user engagement analysis | Meta, Uber, LinkedIn |
| 50 | Compute Retention Rates by Cohort | Hard | 30 min | Cohort join + conditional aggregation | The canonical product analytics query | Meta, Airbnb, Spotify |
Key SQL Patterns:
- Window functions -- ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, running totals
- Self-joins -- Comparing rows within the same table
- CTEs -- Common Table Expressions for readable, modular queries
- Date manipulation -- DATE_TRUNC, DATE_DIFF, interval arithmetic
Study Plans
2-Week Intensive Plan
| Day | Problems | Category | Focus |
|---|---|---|---|
| 1 | #1-4 | Arrays & Hashing | Hash map patterns |
| 2 | #5-7 | Sliding Window | Window management |
| 3 | #8-9, #14-15 | Linked Lists + Binary Search | Pointer manipulation, search |
| 4 | #10-13 | Trees & Graphs | DFS, BFS, topological sort |
| 5 | #16-18 | Dynamic Programming | Memoization, 2D DP |
| 6 | #19-20 | Heaps & Stacks | Top-K, parsing |
| 7 | Review Day | All DSA | Re-solve Yellow/Red problems |
| 8 | #21-24 | ML Implementation | Code ML algorithms |
| 9 | #25-27 | ML Reasoning | Conceptual discussions |
| 10 | #28-30 | ML Applied | Practical ML decisions |
| 11 | #31-35 | ML System Design | End-to-end design |
| 12 | #36-40 | Infrastructure Design | Platform-level thinking |
| 13 | #41-46 | NumPy & Pandas | Data manipulation |
| 14 | #47-50 + Review | SQL + Final Review | Query writing, gap analysis |
4-Week Moderate Plan
| Week | Daily Load | Focus |
|---|---|---|
| Week 1 | 2 problems/day | DSA problems (#1-14) with spaced review |
| Week 2 | 2 problems/day | DSA (#15-20) + ML Implementation (#21-24) + ML Reasoning (#25-27) |
| Week 3 | 2 problems/day | ML Applied (#28-30) + System Design (#31-40) |
| Week 4 | 2 problems/day | NumPy/Pandas/SQL (#41-50) + comprehensive review |
Progress Tracker
DSA Problems (20)
| # | Problem | Status | Date | Time | Notes |
|---|---|---|---|---|---|
| 1 | Two Sum | [ ] | |||
| 2 | Best Time to Buy and Sell Stock | [ ] | |||
| 3 | Group Anagrams | [ ] | |||
| 4 | Product of Array Except Self | [ ] | |||
| 5 | Container With Most Water | [ ] | |||
| 6 | Longest Substring Without Repeating | [ ] | |||
| 7 | Minimum Window Substring | [ ] | |||
| 8 | Reverse Linked List | [ ] | |||
| 9 | Merge Two Sorted Lists | [ ] | |||
| 10 | Maximum Depth of Binary Tree | [ ] | |||
| 11 | Validate Binary Search Tree | [ ] | |||
| 12 | Number of Islands | [ ] | |||
| 13 | Course Schedule | [ ] | |||
| 14 | Binary Search | [ ] | |||
| 15 | Search in Rotated Sorted Array | [ ] | |||
| 16 | Climbing Stairs | [ ] | |||
| 17 | Longest Common Subsequence | [ ] | |||
| 18 | Word Break | [ ] | |||
| 19 | Top K Frequent Elements | [ ] | |||
| 20 | Valid Parentheses | [ ] |
ML Problems (10)
| # | Problem | Status | Date | Time | Notes |
|---|---|---|---|---|---|
| 21 | Linear Regression from Scratch | [ ] | |||
| 22 | K-Means Clustering | [ ] | |||
| 23 | Logistic Regression with GD | [ ] | |||
| 24 | Decision Tree (ID3/CART) | [ ] | |||
| 25 | Bias-Variance Tradeoff | [ ] | |||
| 26 | Cross-Validation Strategy | [ ] | |||
| 27 | Feature Selection Methods | [ ] | |||
| 28 | Handle Class Imbalance | [ ] | |||
| 29 | Debug Overfitting Model | [ ] | |||
| 30 | Choose Recommendation Metrics | [ ] |
System Design Problems (10)
| # | Problem | Status | Date | Time | Notes |
|---|---|---|---|---|---|
| 31 | News Feed Ranking | [ ] | |||
| 32 | Fraud Detection System | [ ] | |||
| 33 | Image Search System | [ ] | |||
| 34 | ML Model Monitoring | [ ] | |||
| 35 | Content Moderation | [ ] | |||
| 36 | Feature Store | [ ] | |||
| 37 | A/B Testing Platform | [ ] | |||
| 38 | Model Training Pipeline | [ ] | |||
| 39 | Low-Latency Model Serving | [ ] | |||
| 40 | Data Labeling Pipeline | [ ] |
NumPy/Pandas/SQL Problems (10)
| # | Problem | Status | Date | Time | Notes |
|---|---|---|---|---|---|
| 41 | Matrix Multiplication | [ ] | |||
| 42 | Cosine Similarity Matrix | [ ] | |||
| 43 | Softmax (Numerically Stable) | [ ] | |||
| 44 | Rolling Average per Cohort | [ ] | |||
| 45 | Handle Missing Values | [ ] | |||
| 46 | Pivot Event Logs to Features | [ ] | |||
| 47 | Second Highest Salary per Dept | [ ] | |||
| 48 | Month-over-Month Growth | [ ] | |||
| 49 | 3+ Consecutive Active Days | [ ] | |||
| 50 | Cohort Retention Rates | [ ] |
Detailed Problem Guides
Problem 1: Two Sum
Category: Arrays & Hashing | Difficulty: Easy | Time: 10 min
Problem: Given an array of integers and a target, return indices of two numbers that sum to the target.
Brute Force: O(n^2) -- check every pair.
Optimal Approach:
1. Create an empty hash map
2. For each number at index i:
a. Compute complement = target - nums[i]
b. If complement exists in hash map, return [map[complement], i]
c. Otherwise, store nums[i] -> i in hash map
3. Time: O(n), Space: O(n)
Key Insight: Trading space for time via hash map is the single most important optimization technique in coding interviews.
Follow-up Questions:
- What if the array is sorted? (Two pointers -- O(1) space)
- What if there are multiple valid pairs? (Return all pairs)
- What about duplicate values? (Handle carefully during insertion)
Problem 21: Linear Regression from Scratch
Category: ML Implementation | Difficulty: Medium | Time: 30 min
Problem: Implement linear regression with gradient descent using only NumPy.
Key Components:
1. Initialize weights w and bias b to zeros (or small random values)
2. Forward pass: y_pred = X @ w + b
3. Loss: MSE = (1/n) * sum((y_pred - y)^2)
4. Gradients:
dw = (2/n) * X.T @ (y_pred - y)
db = (2/n) * sum(y_pred - y)
5. Update: w -= lr * dw, b -= lr * db
6. Repeat for N iterations or until convergence
What interviewers look for:
- Vectorized NumPy (no Python for-loops over data points)
- Correct gradient derivation
- Convergence check (loss decreasing)
- Discussion of learning rate selection
Follow-up Questions:
- How would you add L2 regularization? (Add lambda * w to gradient)
- What happens with features at different scales? (Need standardization)
- When would gradient descent fail? (Non-convex loss, bad learning rate)
Problem 31: News Feed Ranking System
Category: ML System Design | Difficulty: Medium | Time: 40 min
Problem: Design the ranking system that determines what posts appear in a social media news feed.
High-Level Architecture:
1. Candidate Generation (retrieve ~1000 candidates)
- Friends' posts, group posts, followed pages
- Collaborative filtering for content discovery
2. Feature Engineering
- User features: age, interests, engagement history
- Post features: type, age, author engagement rate
- Cross features: user-author affinity, topic relevance
3. Ranking Model
- Multi-task learning: predict P(like), P(comment), P(share), P(hide)
- Combine into a single score with business-weighted formula
4. Post-Processing
- Diversity injection (avoid all posts from same author)
- Freshness boost
- Policy filtering (content moderation)
5. Serving
- Pre-compute embeddings, real-time feature assembly
- Cache ranked lists with TTL
6. Evaluation
- Online: engagement metrics, time spent, user retention
- Offline: NDCG, AUC, calibration
Common Mistakes on Core 50 Problems
:::danger Mistakes That Cost Offers
- Solving Two Sum with nested loops -- Instantly signals you have not prepared
- Implementing ML algorithms with Python loops -- NumPy vectorization is expected
- System design without clarifying requirements -- Jumping to solutions is a red flag
- SQL without window functions -- If you only know GROUP BY, your SQL is incomplete
- Not discussing time/space complexity -- Always state it, even if not asked :::
Pattern Cheat Sheet
| Pattern | Problems That Use It | Recognition Signal |
|---|---|---|
| Hash map complement | #1, #3, #6 | "Find pair/group with property X" |
| Sliding window | #6, #7 | "Substring/subarray with constraint" |
| Two pointers | #5, #8, #9 | "Sorted array" or "shrink search space" |
| BFS/DFS | #10, #11, #12 | "Tree/graph traversal" or "connected components" |
| Topological sort | #13 | "Dependencies" or "ordering with prerequisites" |
| Binary search | #14, #15 | "Sorted" or "monotonic" or "minimize maximum" |
| Dynamic programming | #16, #17, #18 | "Optimal" or "count ways" or "min/max" |
| Heap / priority queue | #19 | "Top K" or "K-th largest/smallest" |
| Stack | #20 | "Matching" or "nesting" or "nearest greater" |
| Gradient descent | #21, #23 | "Implement from scratch" or "optimize" |
| Window functions (SQL) | #47, #48, #49, #50 | "Per group ranking" or "consecutive" or "running total" |
Next Steps
Once you have completed the Core 50, move to your role-specific problem list:
- MLE Problems -- Machine Learning Engineer roles
- AI Engineer Problems -- AI/LLM Engineer roles
- Data Scientist Problems -- Data Scientist roles
- MLOps Problems -- MLOps / ML Platform roles
- Research Engineer Problems -- Research Engineer roles
- Data Engineer Problems -- Data Engineer roles
Or, if you want to continue building breadth by difficulty, try the Easy Tier next.
