Skip to main content

Easy Tier Problems

Reading time: ~40 min | Interview relevance: High | Roles: All AI/ML Roles (especially entry-level, career switchers, and warm-up for experienced engineers)

You open your laptop to start interview prep and immediately jump to a Hard-tier dynamic programming problem. Twenty minutes later, you are staring at a blank screen, confidence shattered. This is how most people fail at interview preparation before they even begin.

Easy-tier problems exist for a reason. They build the muscle memory for patterns that appear repeatedly in harder problems. They confirm that your fundamentals are solid. And they give you the psychological momentum to tackle medium and hard problems with clarity instead of panic.

This list of 35 problems spans coding, ML implementation, data processing, and system design discussion. Every problem can be completed in 15-25 minutes. If any problem takes you longer than 30 minutes, it signals a gap worth investigating before moving up.

How to Use This List

GoalApproach
Total beginnerWork through all 35 problems sequentially over 2 weeks
Warm-up before harder prepComplete in 3-5 days, spending no more than 20 min per problem
Confidence buildingCherry-pick problems from your weakest category
Interview day warm-upSolve 2-3 problems the morning of your interview

:::tip The 20-Minute Rule For easy problems, set a strict 20-minute timer. If you cannot solve it in 20 minutes, read the solution, understand the pattern, then re-solve from scratch the next day. Speed matters at this level. :::

Category 1: Core Data Structures & Algorithms (10 Problems)

These are the bread-and-butter DSA problems. You should be able to solve each one without hesitation.

#ProblemTimeKey PatternCategoryWhat It TestsCompany Tags
1Two Sum10 minHash map lookupArraysHash map for O(n) lookups; the most famous interview problemFAANG, All
2Valid Parentheses10 minStack matchingStacksLIFO structure for balanced matching; expression parsingAll
3Merge Two Sorted Lists10 minTwo-pointer mergeLinked ListsMerge operation foundational to merge sort and stream mergingAll
4Maximum Depth of Binary Tree10 minDFS recursionTreesBasic tree traversal and recursive thinkingAll
5Best Time to Buy and Sell Stock10 minRunning minimumArraysTrack running min/max; single-pass optimizationFAANG, All
6Invert Binary Tree10 minTree recursionTreesRecursive tree transformation; understanding pointer swapsAll
7Contains Duplicate5 minHash setArraysSet operations for uniqueness; O(n) vs O(n log n) tradeoffAll
8Linked List Cycle Detection10 minFloyd's slow/fast pointerLinked ListsTwo-pointer technique; cycle detection appears in graph problemsAll
9Binary Search10 minDivide and conquerSearchThe most fundamental O(log n) algorithm; boundary conditions matterAll
10Implement a Min Stack15 minAuxiliary stackStacksMaintaining O(1) access to additional propertiesAll

:::note Why These 10 Problems Matter Every pattern in this section reappears in medium and hard problems:

  • Hash map -> LRU Cache, Group Anagrams, Two Sum variants
  • Two pointers -> 3Sum, Container With Most Water, merge operations
  • DFS -> Path Sum, Serialize Tree, connected components
  • Stack -> Valid Parentheses -> Largest Rectangle in Histogram
  • Binary Search -> Search in Rotated Array, median finding

Master the easy version first. The hard version is just the easy version with more constraints. :::

Category 2: ML Implementation Basics (8 Problems)

These problems test whether you can implement fundamental ML operations without relying on library calls.

#ProblemTimeKey PatternCategoryWhat It TestsCompany Tags
11Implement K-Nearest Neighbors20 minDistance computation + sortingClassificationNumPy fluency, distance metrics, top-K selectionGoogle, Meta, Startups
12Implement Linear Regression (Closed-Form)15 minMatrix operationsRegressionNormal equation, NumPy matrix math, inverse computationAll
13Compute Precision, Recall, and F1 Score10 minConfusion matrix mathEvaluationMetric computation from confusion matrix; when to use which metricAll
14Implement Sigmoid and Softmax Functions10 minNumerical stabilityActivation FunctionsOverflow prevention with max subtraction; probability normalizationAll
15Implement Min-Max and Z-Score Normalization10 minFeature scalingPreprocessingData normalization; handling edge cases (zero variance)All
16Implement One-Hot Encoding from Scratch10 minCategorical encodingPreprocessingMapping categories to binary vectors; handling unknown categoriesAll
17Implement Train-Test Split with Shuffling10 minRandom samplingEvaluationRandom seed management, stratification awarenessAll
18Compute Euclidean, Manhattan, and Cosine Distance15 minVector operationsSimilarityDistance metric selection; when cosine beats EuclideanAll

:::warning Common Easy ML Mistakes

  • Sigmoid overflow: Always implement as 1 / (1 + exp(-x)) with clipping, or use the max(0, x) trick for numerical stability
  • Softmax overflow: Subtract the max value before exponentiating: exp(x - max(x))
  • Division by zero: Z-score normalization with zero variance; F1 with zero precision or recall
  • Forgetting to shuffle: Train-test split on time-ordered data leaks future information :::

Category 3: Data Processing & SQL (10 Problems)

Practical data manipulation problems that every AI/ML role requires.

Python Data Processing

#ProblemTimeKey PatternCategoryWhat It TestsCompany Tags
19Parse and Aggregate Large Log Files15 minHash map aggregationData ProcessingFile I/O, string parsing, aggregationAll
20Compute Intersection of Two Large Sorted Arrays10 minTwo-pointer mergeArraysEfficient set operations; memory-conscious processingGoogle, Meta
21Read CSV and Compute Column Statistics10 minStreaming computationData ProcessingPandas fluency; handling missing values, type coercionAll
22Implement a Word Frequency Counter10 minHash map countingText ProcessingTokenization, normalization, countingAll
23Build a Simple Data Validation Function15 minSchema checkingData QualityType checks, range validation, null detectionAll

SQL Fundamentals

#ProblemTimeKey PatternCategoryWhat It TestsCompany Tags
24Find Employees Earning More Than Their Manager10 minSelf-joinSQLBasic join logic with table self-referencingAll
25Find Duplicate Emails in a Table5 minGROUP BY + HAVINGSQLAggregation fundamentals; identifying duplicatesAll
26Second Highest Salary10 minSubquery or window functionSQLRanking with edge cases (ties, NULLs)FAANG, All
27Combine Two Tables (Left Join)5 minLEFT JOINSQLUnderstanding NULL-preserving joinsAll
28Customers Who Never Ordered10 minLEFT JOIN + IS NULL / NOT EXISTSSQLAnti-join pattern; filtering for absenceAll

:::tip SQL Practice Strategy These SQL problems are intentionally simple. They test whether you can translate English requirements into correct SQL without syntax errors. If you can solve all five in under 30 minutes total, your SQL fundamentals are solid. If not, spend a day on SQL basics before attempting the Data Engineer or Data Scientist problem lists. :::

Category 4: System Design Discussion (4 Problems)

These are not full system design problems. They are discussion-level questions where you explain concepts clearly and reason about tradeoffs. Each should take 10-15 minutes of verbal explanation.

#ProblemTimeKey PatternCategoryWhat It TestsCompany Tags
29Explain the Bias-Variance Tradeoff10 minConceptualML TheoryCan you explain underfitting vs. overfitting with examples?All
30Compare REST vs. gRPC for Model Serving10 minAPI designSystem DesignTradeoff analysis: latency, compatibility, streaming supportAll
31Explain Why Batch Normalization Helps Training10 minConceptualDeep LearningInternal covariate shift, gradient flow, regularization effectAll
32Describe How You Would Monitor a Model in Production15 minMLOpsSystem DesignData drift, prediction drift, latency, error ratesAll

Category 5: Probability & Statistics Basics (3 Problems)

Many ML interviews include at least one probability question. These are the easy versions.

#ProblemTimeKey PatternCategoryWhat It TestsCompany Tags
33Compute the Expected Value of a Dice Game10 minExpected valueProbabilityBasic expectation calculation; linearity of expectationAll
34Explain P-Value to a Non-Technical Person10 minStatistical inferenceStatisticsCommunication skills; simplifying complex conceptsAll
35Bayes' Theorem: Disease Testing Problem15 minConditional probabilityProbabilityBase rate neglect, prior/posterior reasoningFAANG, All

:::danger Do Not Skip Easy Problems Three reasons engineers skip easy problems and regret it:

  1. Speed matters. In a 45-minute interview, solving an easy warm-up in 5 minutes leaves 40 minutes for the hard follow-up. Solving it in 15 minutes leaves only 30.

  2. Edge cases hide in easy problems. Two Sum with duplicate values. Binary search on an empty array. Train-test split when n=1. Easy problems teach defensive coding.

  3. Interviewers calibrate on easy problems. If you struggle with an easy problem, the interviewer downgrades the rest of the interview. If you crush it quickly with clean code, they give you harder follow-ups (which is what you want). :::

1-Week Easy Tier Study Plan

For candidates who need a quick warm-up before harder preparation:

DayFocusProblemsTime
Day 1DSA fundamentals#1-5 (hash map, stack, merge, DFS, running min)60 min
Day 2More DSA + basics#6-10 (tree recursion, set, cycle, binary search, min stack)60 min
Day 3ML implementation#11-14 (KNN, linear regression, metrics, activations)60 min
Day 4ML + data processing#15-19 (normalization, encoding, split, distances, log parsing)60 min
Day 5Data processing + SQL#20-25 (arrays, CSV, word count, validation, SQL basics)60 min
Day 6SQL + concepts#26-32 (SQL, system design discussions)60 min
Day 7Probability + review#33-35 + re-solve any problems that took >20 min60 min

2-Week Easy Tier Study Plan (Thorough)

For career switchers or candidates returning after a long break:

DayFocusProblemsNotes
Day 1Arrays & Hashing#1, #5, #7Focus on hash map pattern
Day 2Stacks & Lists#2, #3, #8, #10LIFO and pointer-based structures
Day 3Trees & Search#4, #6, #9Recursion and binary search
Day 4Review DSARe-solve #1-10Target <10 min per problem
Day 5ML Basics 1#11, #12KNN, linear regression
Day 6ML Basics 2#13, #14, #15Metrics, activations, normalization
Day 7ML Basics 3#16, #17, #18Encoding, splitting, distances
Day 8Data Processing#19, #20, #21, #22Python file and data handling
Day 9Data Quality#23 + review ML problemsValidation and edge cases
Day 10SQL Day 1#24, #25, #26Joins, GROUP BY, ranking
Day 11SQL Day 2#27, #28LEFT JOIN, anti-join
Day 12Concepts#29, #30, #31, #32Discussion practice (speak out loud)
Day 13Probability#33, #34, #35Expected value, Bayes
Day 14Full reviewRe-solve all problems that felt shakySpeed run the entire list

Problem Deep Dives

Problem 1: Two Sum

Why this problem matters: Two Sum is the most commonly asked interview problem across all companies. It tests whether you reach for the O(n) hash map solution or the O(n^2) brute force. More importantly, it tests how you handle edge cases.

Brute Force:

def two_sum(nums, target):
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] + nums[j] == target:
return [i, j]
return []

Optimal (Hash Map):

def two_sum(nums, target):
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []

Edge Cases to Handle:

  • Array with duplicate values: [3, 3], target = 6
  • Negative numbers: [-1, 2, -3, 4], target = 1
  • Single element: should return empty
  • No solution exists

Follow-Up Questions Interviewers Ask:

  • What if the array is sorted? (Two-pointer approach, O(1) space)
  • What if you need all pairs? (Modify to collect all results)
  • What if there are multiple valid answers? (Return any one, or modify contract)

Problem 11: Implement K-Nearest Neighbors

Why this problem matters: KNN is the simplest ML algorithm, but implementing it correctly tests NumPy fluency, understanding of distance metrics, and handling of edge cases.

Implementation:

import numpy as np
from collections import Counter

def knn_predict(X_train, y_train, x_query, k=5):
# Compute distances (Euclidean)
distances = np.sqrt(np.sum((X_train - x_query) ** 2, axis=1))

# Get k nearest indices
k_nearest_idx = np.argsort(distances)[:k]

# Get labels of k nearest
k_nearest_labels = y_train[k_nearest_idx]

# Majority vote
most_common = Counter(k_nearest_labels).most_common(1)
return most_common[0][0]

Key Points Interviewers Check:

  • Vectorized distance computation (not a for loop)
  • Handling ties in voting (what if 2 classes have equal votes?)
  • Choice of k (odd k avoids ties for binary classification)
  • Distance metric choice (Euclidean vs. Manhattan vs. Cosine)
  • Time complexity: O(n * d) for distance computation + O(n log n) for sorting

Problem 35: Bayes' Theorem Disease Testing

Why this problem matters: This problem exposes base rate neglect, one of the most common reasoning errors. It tests whether you can apply Bayes' theorem correctly under pressure.

Problem Statement: A disease affects 1 in 1000 people. A test has 99% sensitivity (true positive rate) and 99% specificity (true negative rate). If a person tests positive, what is the probability they actually have the disease?

Solution:

P(Disease) = 0.001 (prevalence)
P(Positive | Disease) = 0.99 (sensitivity)
P(Positive | No Disease) = 0.01 (false positive rate)

P(Positive) = P(Pos|D) * P(D) + P(Pos|~D) * P(~D)
= 0.99 * 0.001 + 0.01 * 0.999
= 0.00099 + 0.00999
= 0.01098

P(Disease | Positive) = P(Pos|D) * P(D) / P(Positive)
= 0.00099 / 0.01098
= 0.0901 (about 9%)

Key Insight: Despite the test being 99% accurate, a positive result only means a 9% chance of having the disease. The low base rate dominates. This is why interviewers love this problem -- most candidates intuitively guess 99%.

Patterns You Must Internalize

PatternEasy ProblemMedium/Hard Extension
Hash map for O(1) lookupTwo Sum (#1)LRU Cache, Group Anagrams
Stack for matchingValid Parentheses (#2)Largest Rectangle, Calculator
Two-pointer mergeMerge Sorted Lists (#3)Merge K Sorted Lists, Intersection
DFS recursionMax Depth (#4)Path Sum, Serialize Tree
Running min/maxBuy/Sell Stock (#5)Trapping Rain Water, Sliding Window Max
Floyd's cycle detectionCycle Detection (#8)Find Duplicate Number, Linked List Intersection
Binary searchBinary Search (#9)Search Rotated Array, Median of Two Arrays
Vectorized computationKNN (#11)Batch operations, feature engineering
Conditional probabilityBayes' Theorem (#35)Naive Bayes classifier, A/B test analysis

Confidence Checkpoints

Use these checkpoints to assess readiness for harder tiers:

CheckpointTargetReady for
All 10 DSA problems in <15 min each2.5 hours totalMedium Tier DSA
All 8 ML problems with correct edge cases2 hours totalMedium Tier ML
All 5 SQL problems first try, no syntax errors45 min totalData Engineer/Scientist SQL rounds
Can explain all 4 system design concepts clearly45 min totalMedium Tier System Design
All 3 probability problems correct30 min totalMedium Tier probability
Full list completed<8 hours totalMedium Tier (all categories)

Additional Problem Deep Dives

Problem 5: Best Time to Buy and Sell Stock

Why this problem matters: This is the simplest example of the "running minimum" pattern. It appears trivial, but the pattern extends to harder problems like Trapping Rain Water and Best Time to Buy and Sell Stock III (with multiple transactions).

Brute Force (O(n^2)):

def max_profit_brute(prices):
max_profit = 0
for i in range(len(prices)):
for j in range(i + 1, len(prices)):
max_profit = max(max_profit, prices[j] - prices[i])
return max_profit

Optimal (O(n)):

def max_profit(prices):
min_price = float('inf')
max_profit = 0
for price in prices:
min_price = min(min_price, price)
max_profit = max(max_profit, price - min_price)
return max_profit

The Insight: At each position, the best profit we can make by selling today is today's price - minimum price seen so far. We track the running minimum and the running best profit in a single pass.

Edge Cases:

  • Prices always decreasing: profit = 0 (never buy)
  • Single price: profit = 0
  • All same price: profit = 0
  • Two prices: simple comparison

Follow-Up Extensions:

VariantKey ChangeDifficulty
Buy and Sell Stock II (unlimited transactions)Greedy: sum all upward movesEasy
Buy and Sell Stock III (at most 2 transactions)Track 2 buy/sell statesHard
Buy and Sell Stock IV (at most K transactions)DP with K statesHard
Buy and Sell Stock with CooldownDP with 3 states (hold, sold, rest)Medium

Why this problem matters: Binary search is the most fundamental O(log n) algorithm, but getting the boundary conditions right is notoriously tricky. More interview bugs come from binary search off-by-one errors than from any other algorithm.

Standard Implementation:

def binary_search(nums, target):
lo, hi = 0, len(nums) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2 # Avoid integer overflow
if nums[mid] == target:
return mid
elif nums[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1

Common Variants:

VariantChangeWhen to Use
Find leftmost occurrenceWhen nums[mid] == target, set hi = mid - 1Sorted array with duplicates
Find rightmost occurrenceWhen nums[mid] == target, set lo = mid + 1Sorted array with duplicates
Find insertion pointReturn lo when loop endsbisect_left equivalent
Search rotated arrayCompare mid with lo to determine sorted halfLeetCode 33
Find peak elementCompare mid with mid+1LeetCode 162

The Three Binary Search Templates:

# Template 1: Standard (find exact match)
while lo <= hi:
mid = lo + (hi - lo) // 2
if condition: return mid
elif ...: lo = mid + 1
else: hi = mid - 1

# Template 2: Find boundary (leftmost True)
while lo < hi:
mid = lo + (hi - lo) // 2
if condition(mid): hi = mid
else: lo = mid + 1
return lo

# Template 3: Find boundary (rightmost True)
while lo < hi:
mid = lo + (hi - lo + 1) // 2 # Round up to avoid infinite loop
if condition(mid): lo = mid
else: hi = mid - 1
return lo

Key Points:

  • lo + (hi - lo) // 2 prevents integer overflow (matters in languages like Java/C++)
  • Template 2 and 3 converge when lo == hi, so return lo gives the boundary
  • Template 3 rounds up (+ 1 before // 2) to prevent infinite loop when lo = mid

Problem 14: Implement Sigmoid and Softmax Functions

Why this problem matters: These are the two most common activation/output functions in ML. Getting them numerically stable is critical and something many candidates get wrong.

Sigmoid (Naive vs. Stable):

import numpy as np

# Naive (BROKEN for large negative x)
def sigmoid_naive(x):
return 1 / (1 + np.exp(-x)) # exp(1000) = overflow

# Stable
def sigmoid(x):
# For x >= 0: 1 / (1 + exp(-x))
# For x < 0: exp(x) / (1 + exp(x)) (avoids exp of large positive)
return np.where(
x >= 0,
1 / (1 + np.exp(-x)),
np.exp(x) / (1 + np.exp(x))
)

Softmax (Naive vs. Stable):

# Naive (BROKEN for large values)
def softmax_naive(x):
return np.exp(x) / np.sum(np.exp(x)) # exp(1000) = overflow

# Stable (subtract max before exp)
def softmax(x):
x_shifted = x - np.max(x) # Shift so max is 0
exp_x = np.exp(x_shifted)
return exp_x / np.sum(exp_x)

# For 2D input (batch of vectors)
def softmax_batch(x):
x_shifted = x - np.max(x, axis=1, keepdims=True)
exp_x = np.exp(x_shifted)
return exp_x / np.sum(exp_x, axis=1, keepdims=True)

Why the Max Subtraction Trick Works:

softmax(x) = exp(x_i) / sum(exp(x_j))
= exp(x_i - c) / sum(exp(x_j - c)) for any constant c

Choosing c = max(x) ensures no exponent exceeds 0, preventing overflow. The mathematical result is identical.

Key Points Interviewers Check:

  • Numerical stability (the max subtraction trick)
  • Correct axis handling for batched inputs
  • Knowledge of temperature scaling: softmax(x / T) for controlling sharpness
  • When to use sigmoid vs. softmax (binary vs. multi-class; can use sigmoid for multi-label)

Problem 18: Compute Euclidean, Manhattan, and Cosine Distance

Why this problem matters: Distance metrics are the foundation of KNN, clustering, retrieval systems, and embedding similarity. Choosing the wrong metric can make your model useless.

Implementations:

import numpy as np

def euclidean_distance(a, b):
"""L2 distance: sqrt(sum((a_i - b_i)^2))"""
return np.sqrt(np.sum((a - b) ** 2))

def manhattan_distance(a, b):
"""L1 distance: sum(|a_i - b_i|)"""
return np.sum(np.abs(a - b))

def cosine_similarity(a, b):
"""Cosine: dot(a, b) / (||a|| * ||b||)"""
dot = np.dot(a, b)
norm_a = np.linalg.norm(a)
norm_b = np.linalg.norm(b)
if norm_a == 0 or norm_b == 0:
return 0.0 # Handle zero vector
return dot / (norm_a * norm_b)

def cosine_distance(a, b):
"""1 - cosine_similarity"""
return 1 - cosine_similarity(a, b)

When to Use Which:

MetricBest ForKey PropertyExample Use Case
EuclideanDense, normalized featuresSensitive to magnitudeImage features, physical coordinates
ManhattanSparse features, grid-basedMore robust to outliersCity block distance, sparse text features
CosineText embeddings, directionsInvariant to magnitudeDocument similarity, word embeddings

Common Pitfall: Using Euclidean distance on unnormalized features. If feature A ranges [0, 1] and feature B ranges [0, 10000], Euclidean distance is dominated by feature B. Always normalize first, or use cosine similarity.

Self-Assessment Worksheet

Rate yourself on each category before starting. This helps prioritize your study time.

CategoryProblem CountConfidence (1-5)PriorityAction
DSA (hash map, stack, two pointers)10_________
ML Implementation (KNN, regression, metrics)8_________
Data Processing (log parsing, CSV, validation)5_________
SQL (joins, GROUP BY, ranking)5_________
System Design Discussion4_________
Probability & Statistics3_________

Priority Scoring:

  • Confidence 1-2 AND high interview relevance for your role = High Priority
  • Confidence 3 = Medium Priority (review, do not deep-dive)
  • Confidence 4-5 = Low Priority (quick review on day of)

Speed Drill Templates

Once you have solved each problem at least once, use these speed drills to build interview-day readiness.

15-Minute DSA Sprint

Pick 3 random problems from #1-10. Solve each in exactly 5 minutes. If you cannot finish in 5 minutes, move to the next one. After the 15 minutes, review what slowed you down.

20-Minute ML Sprint

Pick 2 random problems from #11-18. Solve each in exactly 10 minutes. Focus on getting the core algorithm correct. Edge cases and optimization can wait.

10-Minute SQL Sprint

Pick 2 random problems from #24-28. Solve each in exactly 5 minutes. If you have syntax errors, that is the gap to focus on.

Full Easy-Tier Speed Run

Try to complete all 35 problems in under 5 hours. This is your final readiness check before moving to Medium Tier. If any problem takes more than 15 minutes, mark it for review.

TargetTimeStatus
First attempt8-10 hoursLearning
Second attempt5-7 hoursProgressing
Third attempt<5 hoursReady for Medium Tier

Common Misconceptions

MisconceptionReality
"Easy problems are beneath me"Easy problems are the building blocks of every hard problem
"I should spend most time on hard problems"60-70% of interview questions are medium; medium requires easy mastery
"Speed doesn't matter for easy problems"Speed on easy problems determines how much time you have for the hard follow-up
"I can skip easy SQL since I use SQL daily"Interview SQL has specific patterns (window functions, self-joins) that daily queries may not exercise
"I don't need to implement ML basics"Implementation questions test understanding beyond API calls; they appear at every level
"Probability questions are rare"Bayes' theorem and expected value appear in ~30% of DS/MLE interviews

Next Steps

After completing the Easy Tier:

© 2026 EngineersOfAI. All rights reserved.