Easy Tier Problems

Reading time: ~40 min | Interview relevance: High | Roles: All AI/ML Roles (especially entry-level, career switchers, and warm-up for experienced engineers)

You open your laptop to start interview prep and immediately jump to a Hard-tier dynamic programming problem. Twenty minutes later, you are staring at a blank screen, confidence shattered. This is how most people fail at interview preparation before they even begin.

Easy-tier problems exist for a reason. They build the muscle memory for patterns that appear repeatedly in harder problems. They confirm that your fundamentals are solid. And they give you the psychological momentum to tackle medium and hard problems with clarity instead of panic.

This list of 35 problems spans coding, ML implementation, data processing, and system design discussion. Every problem can be completed in 15-25 minutes. If any problem takes you longer than 30 minutes, it signals a gap worth investigating before moving up.

How to Use This List

Goal	Approach
Total beginner	Work through all 35 problems sequentially over 2 weeks
Warm-up before harder prep	Complete in 3-5 days, spending no more than 20 min per problem
Confidence building	Cherry-pick problems from your weakest category
Interview day warm-up	Solve 2-3 problems the morning of your interview

:::tip The 20-Minute Rule For easy problems, set a strict 20-minute timer. If you cannot solve it in 20 minutes, read the solution, understand the pattern, then re-solve from scratch the next day. Speed matters at this level. :::

Category 1: Core Data Structures & Algorithms (10 Problems)

These are the bread-and-butter DSA problems. You should be able to solve each one without hesitation.

#	Problem	Time	Key Pattern	Category	What It Tests	Company Tags
1	Two Sum	10 min	Hash map lookup	Arrays	Hash map for O(n) lookups; the most famous interview problem	FAANG, All
2	Valid Parentheses	10 min	Stack matching	Stacks	LIFO structure for balanced matching; expression parsing	All
3	Merge Two Sorted Lists	10 min	Two-pointer merge	Linked Lists	Merge operation foundational to merge sort and stream merging	All
4	Maximum Depth of Binary Tree	10 min	DFS recursion	Trees	Basic tree traversal and recursive thinking	All
5	Best Time to Buy and Sell Stock	10 min	Running minimum	Arrays	Track running min/max; single-pass optimization	FAANG, All
6	Invert Binary Tree	10 min	Tree recursion	Trees	Recursive tree transformation; understanding pointer swaps	All
7	Contains Duplicate	5 min	Hash set	Arrays	Set operations for uniqueness; O(n) vs O(n log n) tradeoff	All
8	Linked List Cycle Detection	10 min	Floyd's slow/fast pointer	Linked Lists	Two-pointer technique; cycle detection appears in graph problems	All
9	Binary Search	10 min	Divide and conquer	Search	The most fundamental O(log n) algorithm; boundary conditions matter	All
10	Implement a Min Stack	15 min	Auxiliary stack	Stacks	Maintaining O(1) access to additional properties	All

:::note Why These 10 Problems Matter Every pattern in this section reappears in medium and hard problems:

Hash map -> LRU Cache, Group Anagrams, Two Sum variants
Two pointers -> 3Sum, Container With Most Water, merge operations
DFS -> Path Sum, Serialize Tree, connected components
Stack -> Valid Parentheses -> Largest Rectangle in Histogram
Binary Search -> Search in Rotated Array, median finding

Master the easy version first. The hard version is just the easy version with more constraints. :::

Category 2: ML Implementation Basics (8 Problems)

These problems test whether you can implement fundamental ML operations without relying on library calls.

#	Problem	Time	Key Pattern	Category	What It Tests	Company Tags
11	Implement K-Nearest Neighbors	20 min	Distance computation + sorting	Classification	NumPy fluency, distance metrics, top-K selection	Google, Meta, Startups
12	Implement Linear Regression (Closed-Form)	15 min	Matrix operations	Regression	Normal equation, NumPy matrix math, inverse computation	All
13	Compute Precision, Recall, and F1 Score	10 min	Confusion matrix math	Evaluation	Metric computation from confusion matrix; when to use which metric	All
14	Implement Sigmoid and Softmax Functions	10 min	Numerical stability	Activation Functions	Overflow prevention with max subtraction; probability normalization	All
15	Implement Min-Max and Z-Score Normalization	10 min	Feature scaling	Preprocessing	Data normalization; handling edge cases (zero variance)	All
16	Implement One-Hot Encoding from Scratch	10 min	Categorical encoding	Preprocessing	Mapping categories to binary vectors; handling unknown categories	All
17	Implement Train-Test Split with Shuffling	10 min	Random sampling	Evaluation	Random seed management, stratification awareness	All
18	Compute Euclidean, Manhattan, and Cosine Distance	15 min	Vector operations	Similarity	Distance metric selection; when cosine beats Euclidean	All

:::warning Common Easy ML Mistakes

Sigmoid overflow: Always implement as 1 / (1 + exp(-x)) with clipping, or use the max(0, x) trick for numerical stability
Softmax overflow: Subtract the max value before exponentiating: exp(x - max(x))
Division by zero: Z-score normalization with zero variance; F1 with zero precision or recall
Forgetting to shuffle: Train-test split on time-ordered data leaks future information :::

Category 3: Data Processing & SQL (10 Problems)

Practical data manipulation problems that every AI/ML role requires.

Python Data Processing

#	Problem	Time	Key Pattern	Category	What It Tests	Company Tags
19	Parse and Aggregate Large Log Files	15 min	Hash map aggregation	Data Processing	File I/O, string parsing, aggregation	All
20	Compute Intersection of Two Large Sorted Arrays	10 min	Two-pointer merge	Arrays	Efficient set operations; memory-conscious processing	Google, Meta
21	Read CSV and Compute Column Statistics	10 min	Streaming computation	Data Processing	Pandas fluency; handling missing values, type coercion	All
22	Implement a Word Frequency Counter	10 min	Hash map counting	Text Processing	Tokenization, normalization, counting	All
23	Build a Simple Data Validation Function	15 min	Schema checking	Data Quality	Type checks, range validation, null detection	All

SQL Fundamentals

#	Problem	Time	Key Pattern	Category	What It Tests	Company Tags
24	Find Employees Earning More Than Their Manager	10 min	Self-join	SQL	Basic join logic with table self-referencing	All
25	Find Duplicate Emails in a Table	5 min	GROUP BY + HAVING	SQL	Aggregation fundamentals; identifying duplicates	All
26	Second Highest Salary	10 min	Subquery or window function	SQL	Ranking with edge cases (ties, NULLs)	FAANG, All
27	Combine Two Tables (Left Join)	5 min	LEFT JOIN	SQL	Understanding NULL-preserving joins	All
28	Customers Who Never Ordered	10 min	LEFT JOIN + IS NULL / NOT EXISTS	SQL	Anti-join pattern; filtering for absence	All

:::tip SQL Practice Strategy These SQL problems are intentionally simple. They test whether you can translate English requirements into correct SQL without syntax errors. If you can solve all five in under 30 minutes total, your SQL fundamentals are solid. If not, spend a day on SQL basics before attempting the Data Engineer or Data Scientist problem lists. :::

Category 4: System Design Discussion (4 Problems)

These are not full system design problems. They are discussion-level questions where you explain concepts clearly and reason about tradeoffs. Each should take 10-15 minutes of verbal explanation.

#	Problem	Time	Key Pattern	Category	What It Tests	Company Tags
29	Explain the Bias-Variance Tradeoff	10 min	Conceptual	ML Theory	Can you explain underfitting vs. overfitting with examples?	All
30	Compare REST vs. gRPC for Model Serving	10 min	API design	System Design	Tradeoff analysis: latency, compatibility, streaming support	All
31	Explain Why Batch Normalization Helps Training	10 min	Conceptual	Deep Learning	Internal covariate shift, gradient flow, regularization effect	All
32	Describe How You Would Monitor a Model in Production	15 min	MLOps	System Design	Data drift, prediction drift, latency, error rates	All

Category 5: Probability & Statistics Basics (3 Problems)

Many ML interviews include at least one probability question. These are the easy versions.

#	Problem	Time	Key Pattern	Category	What It Tests	Company Tags
33	Compute the Expected Value of a Dice Game	10 min	Expected value	Probability	Basic expectation calculation; linearity of expectation	All
34	Explain P-Value to a Non-Technical Person	10 min	Statistical inference	Statistics	Communication skills; simplifying complex concepts	All
35	Bayes' Theorem: Disease Testing Problem	15 min	Conditional probability	Probability	Base rate neglect, prior/posterior reasoning	FAANG, All

:::danger Do Not Skip Easy Problems Three reasons engineers skip easy problems and regret it:

Speed matters. In a 45-minute interview, solving an easy warm-up in 5 minutes leaves 40 minutes for the hard follow-up. Solving it in 15 minutes leaves only 30.
Edge cases hide in easy problems. Two Sum with duplicate values. Binary search on an empty array. Train-test split when n=1. Easy problems teach defensive coding.
Interviewers calibrate on easy problems. If you struggle with an easy problem, the interviewer downgrades the rest of the interview. If you crush it quickly with clean code, they give you harder follow-ups (which is what you want). :::

1-Week Easy Tier Study Plan

For candidates who need a quick warm-up before harder preparation:

Day	Focus	Problems	Time
Day 1	DSA fundamentals	#1-5 (hash map, stack, merge, DFS, running min)	60 min
Day 2	More DSA + basics	#6-10 (tree recursion, set, cycle, binary search, min stack)	60 min
Day 3	ML implementation	#11-14 (KNN, linear regression, metrics, activations)	60 min
Day 4	ML + data processing	#15-19 (normalization, encoding, split, distances, log parsing)	60 min
Day 5	Data processing + SQL	#20-25 (arrays, CSV, word count, validation, SQL basics)	60 min
Day 6	SQL + concepts	#26-32 (SQL, system design discussions)	60 min
Day 7	Probability + review	#33-35 + re-solve any problems that took >20 min	60 min

2-Week Easy Tier Study Plan (Thorough)

For career switchers or candidates returning after a long break:

Day	Focus	Problems	Notes
Day 1	Arrays & Hashing	#1, #5, #7	Focus on hash map pattern
Day 2	Stacks & Lists	#2, #3, #8, #10	LIFO and pointer-based structures
Day 3	Trees & Search	#4, #6, #9	Recursion and binary search
Day 4	Review DSA	Re-solve #1-10	Target <10 min per problem
Day 5	ML Basics 1	#11, #12	KNN, linear regression
Day 6	ML Basics 2	#13, #14, #15	Metrics, activations, normalization
Day 7	ML Basics 3	#16, #17, #18	Encoding, splitting, distances
Day 8	Data Processing	#19, #20, #21, #22	Python file and data handling
Day 9	Data Quality	#23 + review ML problems	Validation and edge cases
Day 10	SQL Day 1	#24, #25, #26	Joins, GROUP BY, ranking
Day 11	SQL Day 2	#27, #28	LEFT JOIN, anti-join
Day 12	Concepts	#29, #30, #31, #32	Discussion practice (speak out loud)
Day 13	Probability	#33, #34, #35	Expected value, Bayes
Day 14	Full review	Re-solve all problems that felt shaky	Speed run the entire list

Problem Deep Dives

Problem 1: Two Sum

Why this problem matters: Two Sum is the most commonly asked interview problem across all companies. It tests whether you reach for the O(n) hash map solution or the O(n^2) brute force. More importantly, it tests how you handle edge cases.

Brute Force:

def two_sum(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]
    return []

Optimal (Hash Map):

def two_sum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    return []

Edge Cases to Handle:

Array with duplicate values: [3, 3], target = 6
Negative numbers: [-1, 2, -3, 4], target = 1
Single element: should return empty
No solution exists

Follow-Up Questions Interviewers Ask:

What if the array is sorted? (Two-pointer approach, O(1) space)
What if you need all pairs? (Modify to collect all results)
What if there are multiple valid answers? (Return any one, or modify contract)

Problem 11: Implement K-Nearest Neighbors

Why this problem matters: KNN is the simplest ML algorithm, but implementing it correctly tests NumPy fluency, understanding of distance metrics, and handling of edge cases.

Implementation:

import numpy as np
from collections import Counter

def knn_predict(X_train, y_train, x_query, k=5):
    # Compute distances (Euclidean)
    distances = np.sqrt(np.sum((X_train - x_query) ** 2, axis=1))

    # Get k nearest indices
    k_nearest_idx = np.argsort(distances)[:k]

    # Get labels of k nearest
    k_nearest_labels = y_train[k_nearest_idx]

    # Majority vote
    most_common = Counter(k_nearest_labels).most_common(1)
    return most_common[0][0]

Key Points Interviewers Check:

Vectorized distance computation (not a for loop)
Handling ties in voting (what if 2 classes have equal votes?)
Choice of k (odd k avoids ties for binary classification)
Distance metric choice (Euclidean vs. Manhattan vs. Cosine)
Time complexity: O(n * d) for distance computation + O(n log n) for sorting

Problem 35: Bayes' Theorem Disease Testing

Why this problem matters: This problem exposes base rate neglect, one of the most common reasoning errors. It tests whether you can apply Bayes' theorem correctly under pressure.

Problem Statement: A disease affects 1 in 1000 people. A test has 99% sensitivity (true positive rate) and 99% specificity (true negative rate). If a person tests positive, what is the probability they actually have the disease?

Solution:

P(Disease) = 0.001 (prevalence)
P(Positive | Disease) = 0.99 (sensitivity)
P(Positive | No Disease) = 0.01 (false positive rate)

P(Positive) = P(Pos|D) * P(D) + P(Pos|~D) * P(~D)
            = 0.99 * 0.001 + 0.01 * 0.999
            = 0.00099 + 0.00999
            = 0.01098

P(Disease | Positive) = P(Pos|D) * P(D) / P(Positive)
                       = 0.00099 / 0.01098
                       = 0.0901 (about 9%)

Key Insight: Despite the test being 99% accurate, a positive result only means a 9% chance of having the disease. The low base rate dominates. This is why interviewers love this problem -- most candidates intuitively guess 99%.

Patterns You Must Internalize

Pattern	Easy Problem	Medium/Hard Extension
Hash map for O(1) lookup	Two Sum (#1)	LRU Cache, Group Anagrams
Stack for matching	Valid Parentheses (#2)	Largest Rectangle, Calculator
Two-pointer merge	Merge Sorted Lists (#3)	Merge K Sorted Lists, Intersection
DFS recursion	Max Depth (#4)	Path Sum, Serialize Tree
Running min/max	Buy/Sell Stock (#5)	Trapping Rain Water, Sliding Window Max
Floyd's cycle detection	Cycle Detection (#8)	Find Duplicate Number, Linked List Intersection
Binary search	Binary Search (#9)	Search Rotated Array, Median of Two Arrays
Vectorized computation	KNN (#11)	Batch operations, feature engineering
Conditional probability	Bayes' Theorem (#35)	Naive Bayes classifier, A/B test analysis

Confidence Checkpoints

Use these checkpoints to assess readiness for harder tiers:

Checkpoint	Target	Ready for
All 10 DSA problems in <15 min each	2.5 hours total	Medium Tier DSA
All 8 ML problems with correct edge cases	2 hours total	Medium Tier ML
All 5 SQL problems first try, no syntax errors	45 min total	Data Engineer/Scientist SQL rounds
Can explain all 4 system design concepts clearly	45 min total	Medium Tier System Design
All 3 probability problems correct	30 min total	Medium Tier probability
Full list completed	<8 hours total	Medium Tier (all categories)

Additional Problem Deep Dives

Problem 5: Best Time to Buy and Sell Stock

Why this problem matters: This is the simplest example of the "running minimum" pattern. It appears trivial, but the pattern extends to harder problems like Trapping Rain Water and Best Time to Buy and Sell Stock III (with multiple transactions).

Brute Force (O(n^2)):

def max_profit_brute(prices):
    max_profit = 0
    for i in range(len(prices)):
        for j in range(i + 1, len(prices)):
            max_profit = max(max_profit, prices[j] - prices[i])
    return max_profit

Optimal (O(n)):

def max_profit(prices):
    min_price = float('inf')
    max_profit = 0
    for price in prices:
        min_price = min(min_price, price)
        max_profit = max(max_profit, price - min_price)
    return max_profit

The Insight: At each position, the best profit we can make by selling today is today's price - minimum price seen so far. We track the running minimum and the running best profit in a single pass.

Edge Cases:

Prices always decreasing: profit = 0 (never buy)
Single price: profit = 0
All same price: profit = 0
Two prices: simple comparison

Follow-Up Extensions:

Variant	Key Change	Difficulty
Buy and Sell Stock II (unlimited transactions)	Greedy: sum all upward moves	Easy
Buy and Sell Stock III (at most 2 transactions)	Track 2 buy/sell states	Hard
Buy and Sell Stock IV (at most K transactions)	DP with K states	Hard
Buy and Sell Stock with Cooldown	DP with 3 states (hold, sold, rest)	Medium

Problem 9: Binary Search

Why this problem matters: Binary search is the most fundamental O(log n) algorithm, but getting the boundary conditions right is notoriously tricky. More interview bugs come from binary search off-by-one errors than from any other algorithm.

Standard Implementation:

def binary_search(nums, target):
    lo, hi = 0, len(nums) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2  # Avoid integer overflow
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Common Variants:

Variant	Change	When to Use
Find leftmost occurrence	When `nums[mid] == target`, set `hi = mid - 1`	Sorted array with duplicates
Find rightmost occurrence	When `nums[mid] == target`, set `lo = mid + 1`	Sorted array with duplicates
Find insertion point	Return `lo` when loop ends	bisect_left equivalent
Search rotated array	Compare `mid` with `lo` to determine sorted half	LeetCode 33
Find peak element	Compare `mid` with `mid+1`	LeetCode 162

The Three Binary Search Templates:

# Template 1: Standard (find exact match)
while lo <= hi:
    mid = lo + (hi - lo) // 2
    if condition: return mid
    elif ...: lo = mid + 1
    else: hi = mid - 1

# Template 2: Find boundary (leftmost True)
while lo < hi:
    mid = lo + (hi - lo) // 2
    if condition(mid): hi = mid
    else: lo = mid + 1
return lo

# Template 3: Find boundary (rightmost True)
while lo < hi:
    mid = lo + (hi - lo + 1) // 2  # Round up to avoid infinite loop
    if condition(mid): lo = mid
    else: hi = mid - 1
return lo

Key Points:

lo + (hi - lo) // 2 prevents integer overflow (matters in languages like Java/C++)
Template 2 and 3 converge when lo == hi, so return lo gives the boundary
Template 3 rounds up (+ 1 before // 2) to prevent infinite loop when lo = mid

Problem 14: Implement Sigmoid and Softmax Functions

Why this problem matters: These are the two most common activation/output functions in ML. Getting them numerically stable is critical and something many candidates get wrong.

Sigmoid (Naive vs. Stable):

import numpy as np

# Naive (BROKEN for large negative x)
def sigmoid_naive(x):
    return 1 / (1 + np.exp(-x))  # exp(1000) = overflow

# Stable
def sigmoid(x):
    # For x >= 0: 1 / (1 + exp(-x))
    # For x < 0: exp(x) / (1 + exp(x))  (avoids exp of large positive)
    return np.where(
        x >= 0,
        1 / (1 + np.exp(-x)),
        np.exp(x) / (1 + np.exp(x))
    )

Softmax (Naive vs. Stable):

# Naive (BROKEN for large values)
def softmax_naive(x):
    return np.exp(x) / np.sum(np.exp(x))  # exp(1000) = overflow

# Stable (subtract max before exp)
def softmax(x):
    x_shifted = x - np.max(x)  # Shift so max is 0
    exp_x = np.exp(x_shifted)
    return exp_x / np.sum(exp_x)

# For 2D input (batch of vectors)
def softmax_batch(x):
    x_shifted = x - np.max(x, axis=1, keepdims=True)
    exp_x = np.exp(x_shifted)
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

Why the Max Subtraction Trick Works:

softmax(x) = exp(x_i) / sum(exp(x_j))
           = exp(x_i - c) / sum(exp(x_j - c))  for any constant c

Choosing c = max(x) ensures no exponent exceeds 0, preventing overflow. The mathematical result is identical.

Key Points Interviewers Check:

Numerical stability (the max subtraction trick)
Correct axis handling for batched inputs
Knowledge of temperature scaling: softmax(x / T) for controlling sharpness
When to use sigmoid vs. softmax (binary vs. multi-class; can use sigmoid for multi-label)

Problem 18: Compute Euclidean, Manhattan, and Cosine Distance

Why this problem matters: Distance metrics are the foundation of KNN, clustering, retrieval systems, and embedding similarity. Choosing the wrong metric can make your model useless.

Implementations:

import numpy as np

def euclidean_distance(a, b):
    """L2 distance: sqrt(sum((a_i - b_i)^2))"""
    return np.sqrt(np.sum((a - b) ** 2))

def manhattan_distance(a, b):
    """L1 distance: sum(|a_i - b_i|)"""
    return np.sum(np.abs(a - b))

def cosine_similarity(a, b):
    """Cosine: dot(a, b) / (||a|| * ||b||)"""
    dot = np.dot(a, b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    if norm_a == 0 or norm_b == 0:
        return 0.0  # Handle zero vector
    return dot / (norm_a * norm_b)

def cosine_distance(a, b):
    """1 - cosine_similarity"""
    return 1 - cosine_similarity(a, b)

When to Use Which:

Metric	Best For	Key Property	Example Use Case
Euclidean	Dense, normalized features	Sensitive to magnitude	Image features, physical coordinates
Manhattan	Sparse features, grid-based	More robust to outliers	City block distance, sparse text features
Cosine	Text embeddings, directions	Invariant to magnitude	Document similarity, word embeddings

Common Pitfall: Using Euclidean distance on unnormalized features. If feature A ranges [0, 1] and feature B ranges [0, 10000], Euclidean distance is dominated by feature B. Always normalize first, or use cosine similarity.

Self-Assessment Worksheet

Rate yourself on each category before starting. This helps prioritize your study time.

Category	Problem Count	Confidence (1-5)	Priority	Action
DSA (hash map, stack, two pointers)	10	___	___	___
ML Implementation (KNN, regression, metrics)	8	___	___	___
Data Processing (log parsing, CSV, validation)	5	___	___	___
SQL (joins, GROUP BY, ranking)	5	___	___	___
System Design Discussion	4	___	___	___
Probability & Statistics	3	___	___	___

Priority Scoring:

Confidence 1-2 AND high interview relevance for your role = High Priority
Confidence 3 = Medium Priority (review, do not deep-dive)
Confidence 4-5 = Low Priority (quick review on day of)

Speed Drill Templates

Once you have solved each problem at least once, use these speed drills to build interview-day readiness.

15-Minute DSA Sprint

Pick 3 random problems from #1-10. Solve each in exactly 5 minutes. If you cannot finish in 5 minutes, move to the next one. After the 15 minutes, review what slowed you down.

20-Minute ML Sprint

Pick 2 random problems from #11-18. Solve each in exactly 10 minutes. Focus on getting the core algorithm correct. Edge cases and optimization can wait.

10-Minute SQL Sprint

Pick 2 random problems from #24-28. Solve each in exactly 5 minutes. If you have syntax errors, that is the gap to focus on.

Full Easy-Tier Speed Run

Try to complete all 35 problems in under 5 hours. This is your final readiness check before moving to Medium Tier. If any problem takes more than 15 minutes, mark it for review.

Target	Time	Status
First attempt	8-10 hours	Learning
Second attempt	5-7 hours	Progressing
Third attempt	<5 hours	Ready for Medium Tier

Common Misconceptions

Misconception	Reality
"Easy problems are beneath me"	Easy problems are the building blocks of every hard problem
"I should spend most time on hard problems"	60-70% of interview questions are medium; medium requires easy mastery
"Speed doesn't matter for easy problems"	Speed on easy problems determines how much time you have for the hard follow-up
"I can skip easy SQL since I use SQL daily"	Interview SQL has specific patterns (window functions, self-joins) that daily queries may not exercise
"I don't need to implement ML basics"	Implementation questions test understanding beyond API calls; they appear at every level
"Probability questions are rare"	Bayes' theorem and expected value appear in ~30% of DS/MLE interviews

Next Steps

After completing the Easy Tier:

Medium Tier is the natural next step -- it covers the bulk of interview problems
Core 50 for a focused cross-category curriculum
Role-specific lists for targeted preparation:
- MLE Problems for Machine Learning Engineer roles
- AI Engineer Problems for AI/GenAI Engineer roles
- Data Scientist Problems for Data Scientist roles
- Data Engineer Problems for Data Engineer roles

How to Use This List​

Category 1: Core Data Structures & Algorithms (10 Problems)​

Category 2: ML Implementation Basics (8 Problems)​

Category 3: Data Processing & SQL (10 Problems)​

Python Data Processing​

SQL Fundamentals​

Category 4: System Design Discussion (4 Problems)​

Category 5: Probability & Statistics Basics (3 Problems)​

1-Week Easy Tier Study Plan​

2-Week Easy Tier Study Plan (Thorough)​

Problem Deep Dives​

Problem 1: Two Sum​

Problem 11: Implement K-Nearest Neighbors​

Problem 35: Bayes' Theorem Disease Testing​

Patterns You Must Internalize​

Confidence Checkpoints​

Additional Problem Deep Dives​

Problem 5: Best Time to Buy and Sell Stock​

Problem 9: Binary Search​

Problem 14: Implement Sigmoid and Softmax Functions​

Problem 18: Compute Euclidean, Manhattan, and Cosine Distance​

Self-Assessment Worksheet​

Speed Drill Templates​

15-Minute DSA Sprint​

20-Minute ML Sprint​

10-Minute SQL Sprint​

Full Easy-Tier Speed Run​

Common Misconceptions​

Next Steps​