Skip to main content

Hard Tier Problems

Reading time: ~60 min | Interview relevance: High (FAANG/Staff+) | Roles: Senior MLE, Staff+ Engineer, Research Engineer, ML Architect

You have solved the medium problems. You are comfortable with the standard patterns. You walk into a Google L5+ or Meta E6 interview loop and the interviewer pulls out a problem that makes you realize medium was just the warm-up.

Hard-tier problems are where companies separate senior from staff, and staff from principal. These problems require combining multiple patterns, handling complex constraints, reasoning about scale, and making design decisions under ambiguity. They are not asked to trick you. They are asked because the job itself requires this level of problem-solving.

This list of 35 problems is organized by category and calibrated to what top-tier companies actually ask in senior and staff-level interviews. Each problem takes 30-50 minutes for a well-prepared candidate. If you can solve 70% of these under time pressure, you are ready for any interview.

Who Needs Hard-Tier Preparation

TargetHard-Tier Coverage NeededFocus Areas
FAANG L5/E5 (Senior)40-50%Coding + System Design
FAANG L6/E6 (Staff)70-80%All categories, especially system design
AI Lab (Research Engineer)60-70%ML implementation + algorithm depth
OpenAI / Anthropic70-80%Paper implementation + ML depth + system design
Unicorn Senior/Staff50-60%System design + coding
Non-FAANG Senior30-40%Cherry-pick by company

:::warning Hard Problems Require Pattern Mastery Do NOT attempt hard problems before mastering medium patterns. Hard problems combine 2-3 medium patterns into a single problem. Without the building blocks, you will waste time and destroy confidence. Complete the Medium Tier first. :::

Category Distribution

CategoryCountPercentageAvg Time
Advanced Coding (DSA)1029%35 min
ML System Design823%48 min
Advanced ML Implementation514%42 min
Paper Implementation411%45 min
Optimization & Infrastructure411%43 min
Advanced Data & Statistics411%33 min
Total3540 min avg

Category 1: Advanced Coding (10 Problems)

These problems combine multiple data structures and algorithms, require careful edge case handling, and often have O(n) solutions hiding behind O(n^2) initial approaches.

Multi-Pattern Problems

#ProblemDifficultyTimePatterns CombinedWhat Separates Hard from MediumCompany TagsHandbook Chapter
1Merge K Sorted ListsHard30 minMin-heap + linked list mergeK-way merge generalizes 2-way merge; priority queue managementGoogle, Meta, AmazonCoding Interviews
2Serialize and Deserialize Binary TreeHard30 minBFS/DFS + string encodingCustom serialization format; handling nulls in tree structureGoogle, Meta, MicrosoftCoding Interviews
3Word Search II (Multiple Words in Grid)Hard35 minTrie + backtracking + pruningTrie optimization over naive per-word search; early terminationGoogle, AmazonCoding Interviews
4Largest Rectangle in HistogramHard35 minMonotonic stackNon-obvious stack application; extending to maximal rectangle in matrixGoogle, Meta, AmazonCoding Interviews
5Median of Two Sorted ArraysHard40 minBinary search on partitionTrue O(log(min(m,n))); binary search on answer space, not dataGoogle, AmazonCoding Interviews

Dynamic Programming & Optimization

#ProblemDifficultyTimePatterns CombinedWhat Separates Hard from MediumCompany TagsHandbook Chapter
6Edit DistanceHard30 min2D DP + string alignment2D state space; sequence alignment used in bioinformatics and NLPGoogle, MetaCoding Interviews
7Longest Increasing Subsequence (O(n log n))Hard30 minBinary search + patience sortingThe O(n log n) solution using binary search on tails arrayFAANG, AllCoding Interviews
8Regular Expression MatchingHard40 min2D DP + recursive case analysisComplex state transitions with * and .; many edge casesGoogle, MetaCoding Interviews
9Trapping Rain WaterHard30 minTwo pointers or monotonic stackMultiple valid approaches; interviewers expect discussion of allFAANG, AllCoding Interviews
10Design a Concurrent LRU CacheHard35 minLRU + concurrency primitivesThread safety, lock striping, read-write locks for model serving cacheGoogle, Meta, UberCoding Interviews

:::tip Hard Coding Problem Strategy

  1. Spend 5 minutes on approach before coding. For hard problems, the wrong approach wastes 30 minutes.
  2. Identify which medium patterns combine. Largest Rectangle = Monotonic Stack. Word Search II = Trie + Backtracking.
  3. Start with brute force, then optimize. Interviewers want to see your optimization reasoning.
  4. Handle edge cases explicitly. Hard problems always have tricky edge cases (empty input, single element, all duplicates). :::

Category 2: ML System Design (8 Problems)

Hard system design problems require reasoning about scale, multi-component architecture, and non-obvious tradeoffs. These are Staff+ level questions.

Large-Scale ML Systems

#ProblemDifficultyTimeKey ConceptWhat Makes It HardCompany TagsHandbook Chapter
11Design a Search Ranking SystemHard50 minQuery understanding + retrieval + L2RMulti-stage pipeline, learning-to-rank, position bias correctionGoogle, Amazon, AirbnbML System Design
12Design an Ad Click Prediction SystemHard50 minFeature engineering + real-time predictionReal-time features, calibration, bid optimization, revenue constraintsGoogle, Meta, AmazonML System Design
13Design a Distributed Training SystemHard45 minData/model parallelism, gradient aggregationAllReduce vs. parameter server, gradient compression, fault toleranceGoogle, Meta, AI LabsML System Design
14Design a Real-Time Fraud Detection SystemHard45 minStreaming features, low-latency scoringSub-100ms latency, evolving fraud patterns, label delayStripe, Square, PayPalML System Design

ML Infrastructure

#ProblemDifficultyTimeKey ConceptWhat Makes It HardCompany TagsHandbook Chapter
15Design a Large-Scale Knowledge Graph SystemHard50 minGraph storage, entity resolution, inferenceBillions of entities, relation extraction, query efficiencyGoogle, Amazon, MicrosoftML System Design
16Design a Model Experimentation PlatformHard45 minExperiment tracking, reproducibilityMulti-tenant, versioning, configuration management at scaleFAANG, UnicornsML System Design
17Design a Feature Store for Online and Offline ServingHard45 minFeature consistency, low-latency servingTraining-serving skew, point-in-time correctness, feature freshnessUber, Airbnb, DatabricksML System Design
18Design a Multi-Region ML Serving SystemHard45 minGlobal deployment, model syncConsistency across regions, failover, latency optimizationGoogle, Meta, AmazonML System Design

:::note Staff+ System Design Evaluation Criteria At the hard level, interviewers evaluate:

  • Architectural vision: Can you reason about the system holistically?
  • Tradeoff articulation: Can you explain WHY you chose one approach over another?
  • Failure mode analysis: What happens when each component fails?
  • Scale reasoning: How does cost/complexity grow with 10x, 100x scale?
  • Cross-team impact: How does your system affect other teams and services? :::

Category 3: Advanced ML Implementation (5 Problems)

These problems require deep understanding of ML algorithms and the ability to implement them from mathematical foundations.

#ProblemDifficultyTimeKey ConceptWhat Makes It HardCompany TagsHandbook Chapter
19Implement Backpropagation for a 2-Layer Neural NetworkHard45 minChain rule, gradient computationCorrect derivative computation for every layer; numerical stabilityFAANG, AI LabsDeep Learning
20Implement Word2Vec (Skip-gram with Negative Sampling)Hard40 minEmbedding learning, contrastive lossNegative sampling efficiency; gradient updates for large vocabulariesGoogle, Meta, AirbnbDeep Learning
21Implement Gradient Boosted Trees (Training + Prediction)Hard40 minSequential ensemble, residual fittingResidual computation, learning rate shrinkage, regularizationGoogle, Amazon, Big TechML Fundamentals
22Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)Hard45 minQuery-Key-Value, softmax attentionMatrix dimensions, masking, multi-head extensionAI Labs, Google, MetaLLM Interviews
23Implement a Simple Transformer Block (Self-Attention + FFN + LayerNorm)Hard50 minMulti-head attention, residual connectionsFull forward pass with correct dimensions; residual + norm orderAI Labs, Google, MetaLLM Interviews

:::danger ML Implementation Red Flags at the Hard Level

  • Cannot derive gradients by hand (even for simple functions)
  • Does not understand why batch normalization has different behavior at train vs. inference time
  • Cannot explain the difference between attention, self-attention, and cross-attention
  • Implements algorithms without considering numerical stability (log-sum-exp, gradient clipping) :::

Category 4: Paper Implementation (4 Problems)

Research engineer interviews at AI labs often ask you to implement a key idea from a paper. The goal is not perfect reproduction but demonstrating that you can read a paper and translate math into working code.

#ProblemDifficultyTimeKey ConceptWhat Makes It HardCompany TagsHandbook Chapter
24Implement the Core Loop of PPO (Proximal Policy Optimization)Hard50 minClipped surrogate objective + advantage estimationRL optimization; the algorithm behind RLHFOpenAI, Anthropic, DeepMindPaper Discussion
25Implement LoRA (Low-Rank Adaptation) for a Linear LayerHard35 minLow-rank decomposition + frozen weightsParameter-efficient fine-tuning; rank selection tradeoffsOpenAI, Anthropic, Google, MetaLLM Interviews
26Implement the Contrastive Loss (SimCLR / CLIP-style)Hard40 minTemperature-scaled cross-entropy over similarity matrixSelf-supervised learning; hard negative mining; batch size effectsGoogle, OpenAI, MetaDeep Learning
27Implement Beam Search for Sequence DecodingHard35 minTop-K expansion, pruningMaintaining K hypotheses, score normalization, early stoppingGoogle, Meta, AI LabsLLM Interviews

:::tip Paper Implementation Strategy When implementing a paper's algorithm in an interview:

  1. State the key insight of the paper in 2 sentences
  2. Write the loss function or update rule on the whiteboard first
  3. Implement the core loop --- not the full paper, just the novel part
  4. Verify with a toy example --- does the loss decrease? Does the output make sense?
  5. Discuss ablations --- what happens if you remove the key component?

You do not need to memorize entire papers. You need to understand the one idea that makes each paper novel and implement it correctly. :::

Category 5: Optimization & Infrastructure (4 Problems)

These problems test your ability to make ML systems fast, efficient, and production-ready.

#ProblemDifficultyTimeKey ConceptWhat Makes It HardCompany TagsHandbook Chapter
28Optimize Model Serving Latency from 200ms to 50msHard45 minProfiling + batching + quantization + cachingEnd-to-end latency optimization; knowing which knob to turn firstGoogle, Meta, Amazon, UberML System Design
29Design a Distributed Training Pipeline for a 70B Parameter ModelHard50 minData + tensor + pipeline parallelismMulti-GPU and multi-node training; communication overhead managementGoogle, Meta, OpenAI, AnthropicDeep Learning
30Implement a Model Compression Pipeline (Quantization + Pruning + Distillation)Hard45 minAccuracy-latency tradeoff analysisDeployment on edge devices; which technique for which scenarioGoogle, Apple, Qualcomm, MetaML System Design
31Design a Feature Computation System with Sub-10ms LatencyHard45 minPrecomputation + caching + streaming updatesOnline feature serving at scale; consistency guaranteesUber, Airbnb, Databricks, TectonML System Design

Category 6: Advanced Data & Statistics (4 Problems)

Senior data roles and MLE roles at data-heavy companies test advanced SQL and statistical reasoning.

#ProblemDifficultyTimeKey ConceptWhat Makes It HardCompany TagsHandbook Chapter
32Design an Online Experiment with Network EffectsHard35 minCluster randomization, interferenceStandard A/B testing breaks with network effects (social, marketplace)Meta, LinkedIn, UberML Fundamentals
33Recursive CTE for Hierarchical Data ProcessingHard30 minRecursive SQL, tree traversalOrg charts, category trees with depth limit and cycle detectionGoogle, Amazon, SnowflakeCoding Interviews
34Optimize a Slow Query on a 100TB TableHard30 minPartitioning, clustering, materialized viewsUnderstanding query plans, predicate pushdown, storage layoutGoogle, Amazon, SnowflakeCoding Interviews
35Design a Multi-Armed Bandit for Dynamic PricingHard40 minThompson sampling, UCB, regret boundsExploration-exploitation with non-stationary rewardsUber, Airbnb, DoorDashML Fundamentals

Problem Deep Dives

Problem 1: Merge K Sorted Lists

Why this problem matters: It tests heap management, linked list manipulation, and the generalization from 2-way merge to K-way merge. This exact operation appears in distributed systems (merging sorted streams from K workers).

Three Approaches:

ApproachTimeSpaceWhen to Use
Brute force (concatenate + sort)O(N log N)O(N)Never in interviews --- mention as baseline only
Merge pairs iterativelyO(N * K)O(1)If K is small and constant
Min-heapO(N log K)O(K)The expected answer for interviews

Min-Heap Solution:

import heapq

def merge_k_lists(lists):
heap = []
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst.val, i, lst))

dummy = ListNode(0)
current = dummy

while heap:
val, i, node = heapq.heappop(heap)
current.next = node
current = current.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))

return dummy.next

Complexity:

  • Time: O(N log K) where N is total nodes, K is number of lists
  • Space: O(K) for the heap

Key Points Interviewers Check:

  • Using index i as tiebreaker in the heap (nodes may not be comparable)
  • Heap size never exceeds K
  • Generalizes to merging K sorted database result sets
  • Can you discuss the distributed version? (Each list on a different machine)

Problem 5: Median of Two Sorted Arrays

Why this problem matters: This is the hardest binary search problem you will encounter. It tests whether you truly understand binary search beyond "find an element in a sorted array." The O(log(min(m,n))) solution requires partitioning both arrays simultaneously.

Approach:

1. Binary search on the smaller array
2. Partition both arrays such that left_half has (m+n+1)//2 elements
3. Check if partition is valid: max(left) <= min(right) for both arrays
4. If valid, median is derivable from the boundary elements

Solution:

def find_median_sorted_arrays(nums1, nums2):
# Ensure nums1 is the smaller array
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1

m, n = len(nums1), len(nums2)
lo, hi = 0, m

while lo <= hi:
i = (lo + hi) // 2 # partition in nums1
j = (m + n + 1) // 2 - i # partition in nums2

left1 = nums1[i - 1] if i > 0 else float('-inf')
right1 = nums1[i] if i < m else float('inf')
left2 = nums2[j - 1] if j > 0 else float('-inf')
right2 = nums2[j] if j < n else float('inf')

if left1 <= right2 and left2 <= right1:
if (m + n) % 2 == 0:
return (max(left1, left2) + min(right1, right2)) / 2
else:
return max(left1, left2)
elif left1 > right2:
hi = i - 1
else:
lo = i + 1

Key Points Interviewers Check:

  • Why binary search on the smaller array (reduces search space)
  • Correct handling of boundary conditions (partition at 0 or m)
  • Understanding of the partition invariant (all left elements <= all right elements)
  • Time complexity proof: O(log(min(m,n)))

Problem 9: Trapping Rain Water

Why this problem matters: This problem has three valid approaches (two pointers, monotonic stack, prefix max arrays), and interviewers expect you to discuss all three before coding one. It tests your ability to evaluate tradeoffs, not just solve the problem.

Three Approaches:

ApproachTimeSpaceKey Idea
Prefix max arraysO(n)O(n)Precompute left_max and right_max for each index
Monotonic stackO(n)O(n)Track boundaries of "pools" using a decreasing stack
Two pointersO(n)O(1)Move from both ends; the smaller side is the bottleneck

Two-Pointer Solution (Optimal):

def trap(height):
if not height:
return 0

left, right = 0, len(height) - 1
left_max, right_max = height[left], height[right]
water = 0

while left < right:
if left_max < right_max:
left += 1
left_max = max(left_max, height[left])
water += left_max - height[left]
else:
right -= 1
right_max = max(right_max, height[right])
water += right_max - height[right]

return water

Key Insight: At each step, the water level at a position is determined by the minimum of left_max and right_max. By always processing the side with the smaller max, we know the water level at that position without knowing the exact max on the other side.

Problem 11: Design a Search Ranking System

Why this problem matters: Search ranking is the most comprehensive ML system design problem. It combines information retrieval, ML, distributed systems, and product thinking into a single problem.

Architecture:

Query -> Query Understanding -> Retrieval -> Ranking -> Reranking -> Results

1. Query Understanding (10ms budget)
- Spell correction (learned edit distance model)
- Query expansion (synonyms, related terms)
- Intent classification (navigational vs. informational)

2. Retrieval (~1000 candidates, 50ms budget)
- Inverted index (BM25) for lexical matching
- Embedding retrieval (ANN with HNSW) for semantic matching
- Combine with reciprocal rank fusion

3. Ranking (1000 -> 100, 100ms budget)
- Features: query-doc relevance, freshness, authority, personalization
- Model: LambdaMART or cross-encoder neural ranker
- Training: click data with position bias correction (IPW)

4. Reranking (100 -> final, 20ms budget)
- Diversity injection (MMR)
- Business rules (ads, promoted, freshness boost)
- Personalization overlay

5. Monitoring
- Offline: NDCG@10, MAP, MRR
- Online: CTR, query success rate, time-to-click
- Degradation alert: compare online metrics to baseline

Key Tradeoffs to Discuss:

  • Precision vs. recall in retrieval stage
  • Pointwise vs. pairwise vs. listwise ranking loss
  • Position bias in click data (clicks are biased toward top positions)
  • Latency budget allocation across stages

Problem 19: Implement Backpropagation for a 2-Layer Neural Network

Why this problem matters: Backpropagation is the foundation of all deep learning. If you cannot implement it from scratch, you do not understand what your framework is doing. AI labs and senior roles at FAANG test this directly.

Code Template:

import numpy as np

class TwoLayerNet:
def __init__(self, input_dim, hidden_dim, output_dim):
# Xavier initialization
self.W1 = np.random.randn(input_dim, hidden_dim) * np.sqrt(2.0 / input_dim)
self.b1 = np.zeros(hidden_dim)
self.W2 = np.random.randn(hidden_dim, output_dim) * np.sqrt(2.0 / hidden_dim)
self.b2 = np.zeros(output_dim)

def relu(self, x):
return np.maximum(0, x)

def relu_grad(self, x):
return (x > 0).astype(float)

def softmax(self, x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return exp_x / np.sum(exp_x, axis=1, keepdims=True)

def forward(self, X):
self.z1 = X @ self.W1 + self.b1 # (N, H)
self.a1 = self.relu(self.z1) # (N, H)
self.z2 = self.a1 @ self.W2 + self.b2 # (N, C)
self.probs = self.softmax(self.z2) # (N, C)
return self.probs

def backward(self, X, y, lr=0.01):
N = X.shape[0]

# Output layer gradient
dz2 = self.probs.copy()
dz2[range(N), y] -= 1 # (N, C)
dz2 /= N

dW2 = self.a1.T @ dz2 # (H, C)
db2 = np.sum(dz2, axis=0) # (C,)

# Hidden layer gradient
da1 = dz2 @ self.W2.T # (N, H)
dz1 = da1 * self.relu_grad(self.z1) # (N, H)

dW1 = X.T @ dz1 # (D, H)
db1 = np.sum(dz1, axis=0) # (H,)

# Update weights
self.W2 -= lr * dW2
self.b2 -= lr * db2
self.W1 -= lr * dW1
self.b1 -= lr * db1

Key Points Interviewers Check:

  • Correct gradient shapes at every step (comment the shapes)
  • Numerically stable softmax (subtract max before exp)
  • Proper initialization (Xavier or He, not uniform random)
  • ReLU gradient is 0 for negative inputs (not undefined)
  • The 1/N normalization in the gradient (average over batch)

Problem 22: Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)

Why this problem matters: This is the single most important implementation question for anyone working in modern AI. Every LLM, vision transformer, and multimodal model uses multi-head attention. AI labs will ask you to implement this from memory.

Code Template:

import numpy as np

def softmax(x, axis=-1):
x_max = np.max(x, axis=axis, keepdims=True)
exp_x = np.exp(x - x_max)
return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

def scaled_dot_product_attention(Q, K, V, mask=None):
"""
Q, K, V: (batch, heads, seq_len, d_k)
mask: (batch, 1, 1, seq_len) or (batch, 1, seq_len, seq_len)
"""
d_k = Q.shape[-1]
scores = Q @ K.transpose(0, 1, 3, 2) / np.sqrt(d_k)

if mask is not None:
scores = np.where(mask == 0, -1e9, scores)

attention_weights = softmax(scores, axis=-1)
output = attention_weights @ V
return output, attention_weights

class MultiHeadAttention:
def __init__(self, d_model, n_heads):
self.n_heads = n_heads
self.d_k = d_model // n_heads
assert d_model % n_heads == 0

self.W_q = np.random.randn(d_model, d_model) * 0.02
self.W_k = np.random.randn(d_model, d_model) * 0.02
self.W_v = np.random.randn(d_model, d_model) * 0.02
self.W_o = np.random.randn(d_model, d_model) * 0.02

def forward(self, x, mask=None):
batch, seq_len, d_model = x.shape

Q = x @ self.W_q
K = x @ self.W_k
V = x @ self.W_v

# Reshape to (batch, heads, seq, d_k)
Q = Q.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
K = K.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
V = V.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)

output, weights = scaled_dot_product_attention(Q, K, V, mask)

# Concatenate heads
output = output.transpose(0, 2, 1, 3).reshape(batch, seq_len, d_model)
return output @ self.W_o

Key Points Interviewers Check:

  • Correct scaling by sqrt(d_k) (prevents softmax saturation)
  • Proper reshape and transpose for multi-head splitting
  • Correct mask application (causal mask for autoregressive, padding mask for batched sequences)
  • Numerically stable softmax (subtract max before exp)
  • Can you explain why multi-head is better than single-head? (Different representation subspaces)

Problem 25: Implement LoRA (Low-Rank Adaptation)

Why this problem matters: LoRA is the dominant parameter-efficient fine-tuning method. Every company fine-tuning LLMs uses it. This tests whether you understand why low-rank adaptation works and can implement it correctly.

Code Template:

import numpy as np

class LoRALinear:
"""
Wraps a frozen linear layer with low-rank adaptation.
W_effective = W_frozen + (alpha/r) * B @ A
"""
def __init__(self, in_features, out_features, rank=4, alpha=1.0):
# Frozen pretrained weights
self.W = np.random.randn(out_features, in_features) * 0.02
self.W_frozen = True # Do not update during training

# LoRA matrices
self.A = np.random.randn(rank, in_features) * 0.02 # Down-projection
self.B = np.zeros((out_features, rank)) # Up-projection (init to zero)
self.alpha = alpha
self.rank = rank
self.scaling = alpha / rank

def forward(self, x):
base_output = x @ self.W.T
lora_output = (x @ self.A.T) @ self.B.T * self.scaling
return base_output + lora_output

def trainable_params(self):
return self.A.size + self.B.size

def total_params(self):
return self.W.size + self.A.size + self.B.size

def compression_ratio(self):
return self.trainable_params() / self.total_params()

Key Points Interviewers Check:

  • B is initialized to zero (LoRA starts as identity --- no change from pretrained behavior)
  • A is initialized with random values (Kaiming or small Gaussian)
  • Scaling factor alpha/r (controls the magnitude of LoRA contribution)
  • Can you explain rank selection? (Higher rank = more expressiveness, more params, risk of overfitting)
  • Can you discuss when LoRA fails? (Tasks very different from pretraining distribution may need full fine-tuning)

Problem 28: Optimize Model Serving Latency

Why this problem matters: Every production ML team faces this problem. Moving from "model works" to "model serves in 50ms" requires understanding the full serving stack.

Optimization Waterfall:

StepTechniqueExpected SpeedupEffortWhen to Use
1Profile firstN/A (diagnostic)LowAlways --- never optimize without profiling
2Batching2-5x throughputLowWhen individual requests are small
3Model quantization (INT8)2-4x latencyMediumWhen accuracy loss is <0.5%
4ONNX/TensorRT export1.5-3x latencyMediumWhen running on GPU
5Caching10-100x for cache hitsLowWhen inputs have high repetition rate
6Model distillation2-10x latencyHighWhen you have a large teacher model
7Pruning1.5-3x latencyHighStructured pruning for real speedups
8Async preprocessing1.2-2xMediumWhen preprocessing is a bottleneck

Structured Discussion:

1. Where is the latency? (Profile: preprocessing, inference, postprocessing, network)
2. What is the accuracy-latency budget? (Can we lose 0.5% accuracy for 4x speed?)
3. Quick wins first: batching, caching, ONNX export
4. Medium effort: quantization, async preprocessing
5. High effort if needed: distillation, architecture changes
6. Monitoring: track p50, p95, p99 latency separately

6-Week Hard Tier Study Plan

Hard problems require more time per problem and more review. Budget 2-3 hours per day.

WeekFocusProblemsDaily Load
Week 1Advanced coding (Part 1)#1-51 problem/day + review
Week 2Advanced coding (Part 2)#6-101 problem/day + review
Week 3ML implementation + papers#19-271 problem/day
Week 4System design#11-181 design/day
Week 5Optimization + data#28-351-2 problems/day
Week 6Integration + full mocksAll weak areasMocks only

Week 1: Advanced Coding (Multi-Pattern)

Day 1: #1 (Merge K Sorted Lists - min-heap management)
Day 2: #2 (Serialize/Deserialize Tree - encoding design)
Day 3: #3 (Word Search II - Trie + backtracking optimization)
Day 4: #4 (Largest Rectangle - monotonic stack deep dive)
Day 5: #5 (Median of Two Sorted Arrays - binary search on partition)
Day 6-7: Review and re-solve; focus on approach selection, not just coding

Week 2: Advanced Coding (DP & Optimization)

Day 1: #6 (Edit Distance - 2D DP with string alignment)
Day 2: #7 (LIS O(n log n) - patience sorting + binary search)
Day 3: #8 (Regex Matching - complex DP transitions)
Day 4: #9 (Trapping Rain Water - compare 3 approaches)
Day 5: #10 (Concurrent LRU Cache - concurrency primitives)
Day 6-7: Review; time yourself solving #1-10 under 35 min each

Week 3: ML Implementation + Paper Implementation

Day 1: #19 (Backprop - derive all gradients by hand first)
Day 2: #20 (Word2Vec - embedding learning + negative sampling)
Day 3: #21 (Gradient Boosted Trees - residual fitting)
Day 4: #22 (Attention mechanism - matrix dimensions matter)
Day 5: #23 (Transformer block - full forward pass)
Day 6: #24, #25 (PPO core loop, LoRA)
Day 7: #26, #27 (Contrastive loss, Beam search) + review

Week 4: System Design (Large-Scale ML)

Day 1: #11 (Search ranking - the ultimate ML system design)
Day 2: #12 (Ad click prediction - real-time + scale)
Day 3: #13 (Distributed training - parallelism strategies)
Day 4: #14 (Real-time fraud detection - latency constraints)
Day 5: #15 (Knowledge graph - graph + ML combination)
Day 6: #16, #17 (Experimentation platform, feature store)
Day 7: #18 (Multi-region serving) + review all designs

Week 5: Optimization + Data

Day 1: #28 (Serving latency optimization - the profiling-first approach)
Day 2: #29 (Distributed training at 70B scale)
Day 3: #30 (Model compression pipeline)
Day 4: #31 (Sub-10ms feature computation)
Day 5: #32, #33 (Network effect experiments, recursive CTE)
Day 6: #34 (Query optimization on 100TB)
Day 7: #35 (Multi-armed bandit for pricing) + review

Week 6: Integration + Mocks

Day 1: Full mock - hard coding (40 min) + ML depth (40 min)
Day 2: Full mock - system design deep dive (50 min) + behavioral (30 min)
Day 3: Full mock - paper implementation (40 min) + system design (45 min)
Day 4: Targeted review of any problems you scored below 3
Day 5: Full mock - complete senior interview loop (4 hours)
Day 6: Full mock - complete senior interview loop (4 hours)
Day 7: Light review only; rest before the interview

Hard-Tier Patterns Cheat Sheet

#PatternProblemsWhen to Apply
1K-way merge with heap#1Merging K sorted streams
2Serialization design#2Custom encoding/decoding for trees and graphs
3Trie + backtracking#3Multi-pattern search in grids or strings
4Monotonic stack#4Next greater/smaller element, histogram problems
5Binary search on answer#5Searching partition/position rather than value
62D dynamic programming#6, #8String alignment, grid optimization
7Patience sorting#7Longest increasing subsequence
8Multiple approaches#9Two pointers, stack, prefix arrays for same problem
9Concurrent data structures#10Thread-safe caches, concurrent queues
10Multi-stage ML pipeline#11, #12Retrieval -> ranking -> reranking
11Distributed training#13, #29Data/model/pipeline parallelism
12Gradient derivation#19, #20Implementing ML from math
13Attention mechanism#22, #23QKV projections, masking, multi-head
14Clipped objective#24Constrain update magnitude (PPO, TRPO)
15Low-rank decomposition#25Reduce parameter count while preserving expressiveness
16Contrastive learning#26Learn representations from positive/negative pairs
17Latency waterfall#28, #31Profile -> batch -> quantize -> cache
18Exploration-exploitation#35Dynamic optimization under uncertainty

Readiness Assessment

Use this rubric to assess your readiness for hard-tier interviews:

SkillNot ReadyGetting ThereReadyStaff-Ready
Hard codingCannot solve any in 45 minSolve 40% in 40 minSolve 70% in 35 minSolve 85%+ with clean code
ML implementationCannot derive gradientsImplement with hintsImplement from math aloneImplement + explain teaching-style
System designMiss major componentsCover all componentsDeep dive + tradeoffsNovel insights + cross-system impact
Paper implementationCannot extract key ideaGets main idea in 30 minImplements core loop correctlyCritiques methodology + suggests improvements
Statistical reasoningCannot set up the problemCorrect framework, shaky executionRigorous reasoningIdentifies subtle issues (network effects, Simpson's paradox)
CommunicationThink silently, present codeExplain after solvingExplain while solvingTeach the problem to the interviewer

:::tip The Staff+ Differentiator At the hard level, HOW you solve is as important as WHETHER you solve. Staff+ candidates:

  • Discuss tradeoffs before committing to an approach
  • Identify potential failure modes proactively
  • Consider the system-level impact of their design choices
  • Ask clarifying questions that reveal deep problem understanding
  • Communicate their reasoning clearly throughout
  • Connect the problem to real-world experience :::

Common Failure Modes

Failure ModeWhat It Looks LikeHow to Fix
OvercomplicatingUsing 3 data structures when 1 sufficesStart with brute force; optimize only as needed
Premature optimizationJumping to O(log n) without understanding the problemSolve the problem first, then optimize
Ignoring constraintsDesigning a system without discussing latency, cost, or scaleWrite constraints on the board first
Not asking questionsDiving in without clarifying ambiguitySpend 2-3 minutes asking clarifying questions
Perfect is the enemy of doneSpending 20 min on edge cases with 10 min leftGet the core solution working, then handle edges
Memorizing, not understandingCan reproduce code but cannot modify it when constraints changeFor each solution, try a variation (different constraint, different objective)

When to Skip Hard Problems

Hard problems are not universally required. Here is a decision matrix:

TargetHard DSAHard System DesignHard ML ImplHard Paper Impl
Google L3-L4OptionalNoNoNo
Google L5+YesYesYesSometimes
Meta E3-E4OptionalNoNoNo
Meta E5+YesYesYesNo
OpenAI / AnthropicSometimesYesYesYes
DeepMindSometimesYesYesYes
Startup (Series A-B)RarelyYesSometimesNo
FAANG Data ScientistRarelySometimesNoNo

Next Steps

After the Hard Tier:

© 2026 EngineersOfAI. All rights reserved.