Hard Tier Problems
Reading time: ~60 min | Interview relevance: High (FAANG/Staff+) | Roles: Senior MLE, Staff+ Engineer, Research Engineer, ML Architect
You have solved the medium problems. You are comfortable with the standard patterns. You walk into a Google L5+ or Meta E6 interview loop and the interviewer pulls out a problem that makes you realize medium was just the warm-up.
Hard-tier problems are where companies separate senior from staff, and staff from principal. These problems require combining multiple patterns, handling complex constraints, reasoning about scale, and making design decisions under ambiguity. They are not asked to trick you. They are asked because the job itself requires this level of problem-solving.
This list of 35 problems is organized by category and calibrated to what top-tier companies actually ask in senior and staff-level interviews. Each problem takes 30-50 minutes for a well-prepared candidate. If you can solve 70% of these under time pressure, you are ready for any interview.
Who Needs Hard-Tier Preparation
| Target | Hard-Tier Coverage Needed | Focus Areas |
|---|---|---|
| FAANG L5/E5 (Senior) | 40-50% | Coding + System Design |
| FAANG L6/E6 (Staff) | 70-80% | All categories, especially system design |
| AI Lab (Research Engineer) | 60-70% | ML implementation + algorithm depth |
| OpenAI / Anthropic | 70-80% | Paper implementation + ML depth + system design |
| Unicorn Senior/Staff | 50-60% | System design + coding |
| Non-FAANG Senior | 30-40% | Cherry-pick by company |
:::warning Hard Problems Require Pattern Mastery Do NOT attempt hard problems before mastering medium patterns. Hard problems combine 2-3 medium patterns into a single problem. Without the building blocks, you will waste time and destroy confidence. Complete the Medium Tier first. :::
Category Distribution
| Category | Count | Percentage | Avg Time |
|---|---|---|---|
| Advanced Coding (DSA) | 10 | 29% | 35 min |
| ML System Design | 8 | 23% | 48 min |
| Advanced ML Implementation | 5 | 14% | 42 min |
| Paper Implementation | 4 | 11% | 45 min |
| Optimization & Infrastructure | 4 | 11% | 43 min |
| Advanced Data & Statistics | 4 | 11% | 33 min |
| Total | 35 | 40 min avg |
Category 1: Advanced Coding (10 Problems)
These problems combine multiple data structures and algorithms, require careful edge case handling, and often have O(n) solutions hiding behind O(n^2) initial approaches.
Multi-Pattern Problems
| # | Problem | Difficulty | Time | Patterns Combined | What Separates Hard from Medium | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 1 | Merge K Sorted Lists | Hard | 30 min | Min-heap + linked list merge | K-way merge generalizes 2-way merge; priority queue management | Google, Meta, Amazon | Coding Interviews |
| 2 | Serialize and Deserialize Binary Tree | Hard | 30 min | BFS/DFS + string encoding | Custom serialization format; handling nulls in tree structure | Google, Meta, Microsoft | Coding Interviews |
| 3 | Word Search II (Multiple Words in Grid) | Hard | 35 min | Trie + backtracking + pruning | Trie optimization over naive per-word search; early termination | Google, Amazon | Coding Interviews |
| 4 | Largest Rectangle in Histogram | Hard | 35 min | Monotonic stack | Non-obvious stack application; extending to maximal rectangle in matrix | Google, Meta, Amazon | Coding Interviews |
| 5 | Median of Two Sorted Arrays | Hard | 40 min | Binary search on partition | True O(log(min(m,n))); binary search on answer space, not data | Google, Amazon | Coding Interviews |
Dynamic Programming & Optimization
| # | Problem | Difficulty | Time | Patterns Combined | What Separates Hard from Medium | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 6 | Edit Distance | Hard | 30 min | 2D DP + string alignment | 2D state space; sequence alignment used in bioinformatics and NLP | Google, Meta | Coding Interviews |
| 7 | Longest Increasing Subsequence (O(n log n)) | Hard | 30 min | Binary search + patience sorting | The O(n log n) solution using binary search on tails array | FAANG, All | Coding Interviews |
| 8 | Regular Expression Matching | Hard | 40 min | 2D DP + recursive case analysis | Complex state transitions with * and .; many edge cases | Google, Meta | Coding Interviews |
| 9 | Trapping Rain Water | Hard | 30 min | Two pointers or monotonic stack | Multiple valid approaches; interviewers expect discussion of all | FAANG, All | Coding Interviews |
| 10 | Design a Concurrent LRU Cache | Hard | 35 min | LRU + concurrency primitives | Thread safety, lock striping, read-write locks for model serving cache | Google, Meta, Uber | Coding Interviews |
:::tip Hard Coding Problem Strategy
- Spend 5 minutes on approach before coding. For hard problems, the wrong approach wastes 30 minutes.
- Identify which medium patterns combine. Largest Rectangle = Monotonic Stack. Word Search II = Trie + Backtracking.
- Start with brute force, then optimize. Interviewers want to see your optimization reasoning.
- Handle edge cases explicitly. Hard problems always have tricky edge cases (empty input, single element, all duplicates). :::
Category 2: ML System Design (8 Problems)
Hard system design problems require reasoning about scale, multi-component architecture, and non-obvious tradeoffs. These are Staff+ level questions.
Large-Scale ML Systems
| # | Problem | Difficulty | Time | Key Concept | What Makes It Hard | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 11 | Design a Search Ranking System | Hard | 50 min | Query understanding + retrieval + L2R | Multi-stage pipeline, learning-to-rank, position bias correction | Google, Amazon, Airbnb | ML System Design |
| 12 | Design an Ad Click Prediction System | Hard | 50 min | Feature engineering + real-time prediction | Real-time features, calibration, bid optimization, revenue constraints | Google, Meta, Amazon | ML System Design |
| 13 | Design a Distributed Training System | Hard | 45 min | Data/model parallelism, gradient aggregation | AllReduce vs. parameter server, gradient compression, fault tolerance | Google, Meta, AI Labs | ML System Design |
| 14 | Design a Real-Time Fraud Detection System | Hard | 45 min | Streaming features, low-latency scoring | Sub-100ms latency, evolving fraud patterns, label delay | Stripe, Square, PayPal | ML System Design |
ML Infrastructure
| # | Problem | Difficulty | Time | Key Concept | What Makes It Hard | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 15 | Design a Large-Scale Knowledge Graph System | Hard | 50 min | Graph storage, entity resolution, inference | Billions of entities, relation extraction, query efficiency | Google, Amazon, Microsoft | ML System Design |
| 16 | Design a Model Experimentation Platform | Hard | 45 min | Experiment tracking, reproducibility | Multi-tenant, versioning, configuration management at scale | FAANG, Unicorns | ML System Design |
| 17 | Design a Feature Store for Online and Offline Serving | Hard | 45 min | Feature consistency, low-latency serving | Training-serving skew, point-in-time correctness, feature freshness | Uber, Airbnb, Databricks | ML System Design |
| 18 | Design a Multi-Region ML Serving System | Hard | 45 min | Global deployment, model sync | Consistency across regions, failover, latency optimization | Google, Meta, Amazon | ML System Design |
:::note Staff+ System Design Evaluation Criteria At the hard level, interviewers evaluate:
- Architectural vision: Can you reason about the system holistically?
- Tradeoff articulation: Can you explain WHY you chose one approach over another?
- Failure mode analysis: What happens when each component fails?
- Scale reasoning: How does cost/complexity grow with 10x, 100x scale?
- Cross-team impact: How does your system affect other teams and services? :::
Category 3: Advanced ML Implementation (5 Problems)
These problems require deep understanding of ML algorithms and the ability to implement them from mathematical foundations.
| # | Problem | Difficulty | Time | Key Concept | What Makes It Hard | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 19 | Implement Backpropagation for a 2-Layer Neural Network | Hard | 45 min | Chain rule, gradient computation | Correct derivative computation for every layer; numerical stability | FAANG, AI Labs | Deep Learning |
| 20 | Implement Word2Vec (Skip-gram with Negative Sampling) | Hard | 40 min | Embedding learning, contrastive loss | Negative sampling efficiency; gradient updates for large vocabularies | Google, Meta, Airbnb | Deep Learning |
| 21 | Implement Gradient Boosted Trees (Training + Prediction) | Hard | 40 min | Sequential ensemble, residual fitting | Residual computation, learning rate shrinkage, regularization | Google, Amazon, Big Tech | ML Fundamentals |
| 22 | Implement Attention Mechanism (Scaled Dot-Product + Multi-Head) | Hard | 45 min | Query-Key-Value, softmax attention | Matrix dimensions, masking, multi-head extension | AI Labs, Google, Meta | LLM Interviews |
| 23 | Implement a Simple Transformer Block (Self-Attention + FFN + LayerNorm) | Hard | 50 min | Multi-head attention, residual connections | Full forward pass with correct dimensions; residual + norm order | AI Labs, Google, Meta | LLM Interviews |
:::danger ML Implementation Red Flags at the Hard Level
- Cannot derive gradients by hand (even for simple functions)
- Does not understand why batch normalization has different behavior at train vs. inference time
- Cannot explain the difference between attention, self-attention, and cross-attention
- Implements algorithms without considering numerical stability (log-sum-exp, gradient clipping) :::
Category 4: Paper Implementation (4 Problems)
Research engineer interviews at AI labs often ask you to implement a key idea from a paper. The goal is not perfect reproduction but demonstrating that you can read a paper and translate math into working code.
| # | Problem | Difficulty | Time | Key Concept | What Makes It Hard | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 24 | Implement the Core Loop of PPO (Proximal Policy Optimization) | Hard | 50 min | Clipped surrogate objective + advantage estimation | RL optimization; the algorithm behind RLHF | OpenAI, Anthropic, DeepMind | Paper Discussion |
| 25 | Implement LoRA (Low-Rank Adaptation) for a Linear Layer | Hard | 35 min | Low-rank decomposition + frozen weights | Parameter-efficient fine-tuning; rank selection tradeoffs | OpenAI, Anthropic, Google, Meta | LLM Interviews |
| 26 | Implement the Contrastive Loss (SimCLR / CLIP-style) | Hard | 40 min | Temperature-scaled cross-entropy over similarity matrix | Self-supervised learning; hard negative mining; batch size effects | Google, OpenAI, Meta | Deep Learning |
| 27 | Implement Beam Search for Sequence Decoding | Hard | 35 min | Top-K expansion, pruning | Maintaining K hypotheses, score normalization, early stopping | Google, Meta, AI Labs | LLM Interviews |
:::tip Paper Implementation Strategy When implementing a paper's algorithm in an interview:
- State the key insight of the paper in 2 sentences
- Write the loss function or update rule on the whiteboard first
- Implement the core loop --- not the full paper, just the novel part
- Verify with a toy example --- does the loss decrease? Does the output make sense?
- Discuss ablations --- what happens if you remove the key component?
You do not need to memorize entire papers. You need to understand the one idea that makes each paper novel and implement it correctly. :::
Category 5: Optimization & Infrastructure (4 Problems)
These problems test your ability to make ML systems fast, efficient, and production-ready.
| # | Problem | Difficulty | Time | Key Concept | What Makes It Hard | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 28 | Optimize Model Serving Latency from 200ms to 50ms | Hard | 45 min | Profiling + batching + quantization + caching | End-to-end latency optimization; knowing which knob to turn first | Google, Meta, Amazon, Uber | ML System Design |
| 29 | Design a Distributed Training Pipeline for a 70B Parameter Model | Hard | 50 min | Data + tensor + pipeline parallelism | Multi-GPU and multi-node training; communication overhead management | Google, Meta, OpenAI, Anthropic | Deep Learning |
| 30 | Implement a Model Compression Pipeline (Quantization + Pruning + Distillation) | Hard | 45 min | Accuracy-latency tradeoff analysis | Deployment on edge devices; which technique for which scenario | Google, Apple, Qualcomm, Meta | ML System Design |
| 31 | Design a Feature Computation System with Sub-10ms Latency | Hard | 45 min | Precomputation + caching + streaming updates | Online feature serving at scale; consistency guarantees | Uber, Airbnb, Databricks, Tecton | ML System Design |
Category 6: Advanced Data & Statistics (4 Problems)
Senior data roles and MLE roles at data-heavy companies test advanced SQL and statistical reasoning.
| # | Problem | Difficulty | Time | Key Concept | What Makes It Hard | Company Tags | Handbook Chapter |
|---|---|---|---|---|---|---|---|
| 32 | Design an Online Experiment with Network Effects | Hard | 35 min | Cluster randomization, interference | Standard A/B testing breaks with network effects (social, marketplace) | Meta, LinkedIn, Uber | ML Fundamentals |
| 33 | Recursive CTE for Hierarchical Data Processing | Hard | 30 min | Recursive SQL, tree traversal | Org charts, category trees with depth limit and cycle detection | Google, Amazon, Snowflake | Coding Interviews |
| 34 | Optimize a Slow Query on a 100TB Table | Hard | 30 min | Partitioning, clustering, materialized views | Understanding query plans, predicate pushdown, storage layout | Google, Amazon, Snowflake | Coding Interviews |
| 35 | Design a Multi-Armed Bandit for Dynamic Pricing | Hard | 40 min | Thompson sampling, UCB, regret bounds | Exploration-exploitation with non-stationary rewards | Uber, Airbnb, DoorDash | ML Fundamentals |
Problem Deep Dives
Problem 1: Merge K Sorted Lists
Why this problem matters: It tests heap management, linked list manipulation, and the generalization from 2-way merge to K-way merge. This exact operation appears in distributed systems (merging sorted streams from K workers).
Three Approaches:
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Brute force (concatenate + sort) | O(N log N) | O(N) | Never in interviews --- mention as baseline only |
| Merge pairs iteratively | O(N * K) | O(1) | If K is small and constant |
| Min-heap | O(N log K) | O(K) | The expected answer for interviews |
Min-Heap Solution:
import heapq
def merge_k_lists(lists):
heap = []
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst.val, i, lst))
dummy = ListNode(0)
current = dummy
while heap:
val, i, node = heapq.heappop(heap)
current.next = node
current = current.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
Complexity:
- Time: O(N log K) where N is total nodes, K is number of lists
- Space: O(K) for the heap
Key Points Interviewers Check:
- Using index
ias tiebreaker in the heap (nodes may not be comparable) - Heap size never exceeds K
- Generalizes to merging K sorted database result sets
- Can you discuss the distributed version? (Each list on a different machine)
Problem 5: Median of Two Sorted Arrays
Why this problem matters: This is the hardest binary search problem you will encounter. It tests whether you truly understand binary search beyond "find an element in a sorted array." The O(log(min(m,n))) solution requires partitioning both arrays simultaneously.
Approach:
1. Binary search on the smaller array
2. Partition both arrays such that left_half has (m+n+1)//2 elements
3. Check if partition is valid: max(left) <= min(right) for both arrays
4. If valid, median is derivable from the boundary elements
Solution:
def find_median_sorted_arrays(nums1, nums2):
# Ensure nums1 is the smaller array
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
lo, hi = 0, m
while lo <= hi:
i = (lo + hi) // 2 # partition in nums1
j = (m + n + 1) // 2 - i # partition in nums2
left1 = nums1[i - 1] if i > 0 else float('-inf')
right1 = nums1[i] if i < m else float('inf')
left2 = nums2[j - 1] if j > 0 else float('-inf')
right2 = nums2[j] if j < n else float('inf')
if left1 <= right2 and left2 <= right1:
if (m + n) % 2 == 0:
return (max(left1, left2) + min(right1, right2)) / 2
else:
return max(left1, left2)
elif left1 > right2:
hi = i - 1
else:
lo = i + 1
Key Points Interviewers Check:
- Why binary search on the smaller array (reduces search space)
- Correct handling of boundary conditions (partition at 0 or m)
- Understanding of the partition invariant (all left elements <= all right elements)
- Time complexity proof: O(log(min(m,n)))
Problem 9: Trapping Rain Water
Why this problem matters: This problem has three valid approaches (two pointers, monotonic stack, prefix max arrays), and interviewers expect you to discuss all three before coding one. It tests your ability to evaluate tradeoffs, not just solve the problem.
Three Approaches:
| Approach | Time | Space | Key Idea |
|---|---|---|---|
| Prefix max arrays | O(n) | O(n) | Precompute left_max and right_max for each index |
| Monotonic stack | O(n) | O(n) | Track boundaries of "pools" using a decreasing stack |
| Two pointers | O(n) | O(1) | Move from both ends; the smaller side is the bottleneck |
Two-Pointer Solution (Optimal):
def trap(height):
if not height:
return 0
left, right = 0, len(height) - 1
left_max, right_max = height[left], height[right]
water = 0
while left < right:
if left_max < right_max:
left += 1
left_max = max(left_max, height[left])
water += left_max - height[left]
else:
right -= 1
right_max = max(right_max, height[right])
water += right_max - height[right]
return water
Key Insight: At each step, the water level at a position is determined by the minimum of left_max and right_max. By always processing the side with the smaller max, we know the water level at that position without knowing the exact max on the other side.
Problem 11: Design a Search Ranking System
Why this problem matters: Search ranking is the most comprehensive ML system design problem. It combines information retrieval, ML, distributed systems, and product thinking into a single problem.
Architecture:
Query -> Query Understanding -> Retrieval -> Ranking -> Reranking -> Results
1. Query Understanding (10ms budget)
- Spell correction (learned edit distance model)
- Query expansion (synonyms, related terms)
- Intent classification (navigational vs. informational)
2. Retrieval (~1000 candidates, 50ms budget)
- Inverted index (BM25) for lexical matching
- Embedding retrieval (ANN with HNSW) for semantic matching
- Combine with reciprocal rank fusion
3. Ranking (1000 -> 100, 100ms budget)
- Features: query-doc relevance, freshness, authority, personalization
- Model: LambdaMART or cross-encoder neural ranker
- Training: click data with position bias correction (IPW)
4. Reranking (100 -> final, 20ms budget)
- Diversity injection (MMR)
- Business rules (ads, promoted, freshness boost)
- Personalization overlay
5. Monitoring
- Offline: NDCG@10, MAP, MRR
- Online: CTR, query success rate, time-to-click
- Degradation alert: compare online metrics to baseline
Key Tradeoffs to Discuss:
- Precision vs. recall in retrieval stage
- Pointwise vs. pairwise vs. listwise ranking loss
- Position bias in click data (clicks are biased toward top positions)
- Latency budget allocation across stages
Problem 19: Implement Backpropagation for a 2-Layer Neural Network
Why this problem matters: Backpropagation is the foundation of all deep learning. If you cannot implement it from scratch, you do not understand what your framework is doing. AI labs and senior roles at FAANG test this directly.
Code Template:
import numpy as np
class TwoLayerNet:
def __init__(self, input_dim, hidden_dim, output_dim):
# Xavier initialization
self.W1 = np.random.randn(input_dim, hidden_dim) * np.sqrt(2.0 / input_dim)
self.b1 = np.zeros(hidden_dim)
self.W2 = np.random.randn(hidden_dim, output_dim) * np.sqrt(2.0 / hidden_dim)
self.b2 = np.zeros(output_dim)
def relu(self, x):
return np.maximum(0, x)
def relu_grad(self, x):
return (x > 0).astype(float)
def softmax(self, x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return exp_x / np.sum(exp_x, axis=1, keepdims=True)
def forward(self, X):
self.z1 = X @ self.W1 + self.b1 # (N, H)
self.a1 = self.relu(self.z1) # (N, H)
self.z2 = self.a1 @ self.W2 + self.b2 # (N, C)
self.probs = self.softmax(self.z2) # (N, C)
return self.probs
def backward(self, X, y, lr=0.01):
N = X.shape[0]
# Output layer gradient
dz2 = self.probs.copy()
dz2[range(N), y] -= 1 # (N, C)
dz2 /= N
dW2 = self.a1.T @ dz2 # (H, C)
db2 = np.sum(dz2, axis=0) # (C,)
# Hidden layer gradient
da1 = dz2 @ self.W2.T # (N, H)
dz1 = da1 * self.relu_grad(self.z1) # (N, H)
dW1 = X.T @ dz1 # (D, H)
db1 = np.sum(dz1, axis=0) # (H,)
# Update weights
self.W2 -= lr * dW2
self.b2 -= lr * db2
self.W1 -= lr * dW1
self.b1 -= lr * db1
Key Points Interviewers Check:
- Correct gradient shapes at every step (comment the shapes)
- Numerically stable softmax (subtract max before exp)
- Proper initialization (Xavier or He, not uniform random)
- ReLU gradient is 0 for negative inputs (not undefined)
- The 1/N normalization in the gradient (average over batch)
Problem 22: Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)
Why this problem matters: This is the single most important implementation question for anyone working in modern AI. Every LLM, vision transformer, and multimodal model uses multi-head attention. AI labs will ask you to implement this from memory.
Code Template:
import numpy as np
def softmax(x, axis=-1):
x_max = np.max(x, axis=axis, keepdims=True)
exp_x = np.exp(x - x_max)
return exp_x / np.sum(exp_x, axis=axis, keepdims=True)
def scaled_dot_product_attention(Q, K, V, mask=None):
"""
Q, K, V: (batch, heads, seq_len, d_k)
mask: (batch, 1, 1, seq_len) or (batch, 1, seq_len, seq_len)
"""
d_k = Q.shape[-1]
scores = Q @ K.transpose(0, 1, 3, 2) / np.sqrt(d_k)
if mask is not None:
scores = np.where(mask == 0, -1e9, scores)
attention_weights = softmax(scores, axis=-1)
output = attention_weights @ V
return output, attention_weights
class MultiHeadAttention:
def __init__(self, d_model, n_heads):
self.n_heads = n_heads
self.d_k = d_model // n_heads
assert d_model % n_heads == 0
self.W_q = np.random.randn(d_model, d_model) * 0.02
self.W_k = np.random.randn(d_model, d_model) * 0.02
self.W_v = np.random.randn(d_model, d_model) * 0.02
self.W_o = np.random.randn(d_model, d_model) * 0.02
def forward(self, x, mask=None):
batch, seq_len, d_model = x.shape
Q = x @ self.W_q
K = x @ self.W_k
V = x @ self.W_v
# Reshape to (batch, heads, seq, d_k)
Q = Q.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
K = K.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
V = V.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
output, weights = scaled_dot_product_attention(Q, K, V, mask)
# Concatenate heads
output = output.transpose(0, 2, 1, 3).reshape(batch, seq_len, d_model)
return output @ self.W_o
Key Points Interviewers Check:
- Correct scaling by sqrt(d_k) (prevents softmax saturation)
- Proper reshape and transpose for multi-head splitting
- Correct mask application (causal mask for autoregressive, padding mask for batched sequences)
- Numerically stable softmax (subtract max before exp)
- Can you explain why multi-head is better than single-head? (Different representation subspaces)
Problem 25: Implement LoRA (Low-Rank Adaptation)
Why this problem matters: LoRA is the dominant parameter-efficient fine-tuning method. Every company fine-tuning LLMs uses it. This tests whether you understand why low-rank adaptation works and can implement it correctly.
Code Template:
import numpy as np
class LoRALinear:
"""
Wraps a frozen linear layer with low-rank adaptation.
W_effective = W_frozen + (alpha/r) * B @ A
"""
def __init__(self, in_features, out_features, rank=4, alpha=1.0):
# Frozen pretrained weights
self.W = np.random.randn(out_features, in_features) * 0.02
self.W_frozen = True # Do not update during training
# LoRA matrices
self.A = np.random.randn(rank, in_features) * 0.02 # Down-projection
self.B = np.zeros((out_features, rank)) # Up-projection (init to zero)
self.alpha = alpha
self.rank = rank
self.scaling = alpha / rank
def forward(self, x):
base_output = x @ self.W.T
lora_output = (x @ self.A.T) @ self.B.T * self.scaling
return base_output + lora_output
def trainable_params(self):
return self.A.size + self.B.size
def total_params(self):
return self.W.size + self.A.size + self.B.size
def compression_ratio(self):
return self.trainable_params() / self.total_params()
Key Points Interviewers Check:
- B is initialized to zero (LoRA starts as identity --- no change from pretrained behavior)
- A is initialized with random values (Kaiming or small Gaussian)
- Scaling factor alpha/r (controls the magnitude of LoRA contribution)
- Can you explain rank selection? (Higher rank = more expressiveness, more params, risk of overfitting)
- Can you discuss when LoRA fails? (Tasks very different from pretraining distribution may need full fine-tuning)
Problem 28: Optimize Model Serving Latency
Why this problem matters: Every production ML team faces this problem. Moving from "model works" to "model serves in 50ms" requires understanding the full serving stack.
Optimization Waterfall:
| Step | Technique | Expected Speedup | Effort | When to Use |
|---|---|---|---|---|
| 1 | Profile first | N/A (diagnostic) | Low | Always --- never optimize without profiling |
| 2 | Batching | 2-5x throughput | Low | When individual requests are small |
| 3 | Model quantization (INT8) | 2-4x latency | Medium | When accuracy loss is <0.5% |
| 4 | ONNX/TensorRT export | 1.5-3x latency | Medium | When running on GPU |
| 5 | Caching | 10-100x for cache hits | Low | When inputs have high repetition rate |
| 6 | Model distillation | 2-10x latency | High | When you have a large teacher model |
| 7 | Pruning | 1.5-3x latency | High | Structured pruning for real speedups |
| 8 | Async preprocessing | 1.2-2x | Medium | When preprocessing is a bottleneck |
Structured Discussion:
1. Where is the latency? (Profile: preprocessing, inference, postprocessing, network)
2. What is the accuracy-latency budget? (Can we lose 0.5% accuracy for 4x speed?)
3. Quick wins first: batching, caching, ONNX export
4. Medium effort: quantization, async preprocessing
5. High effort if needed: distillation, architecture changes
6. Monitoring: track p50, p95, p99 latency separately
6-Week Hard Tier Study Plan
Hard problems require more time per problem and more review. Budget 2-3 hours per day.
| Week | Focus | Problems | Daily Load |
|---|---|---|---|
| Week 1 | Advanced coding (Part 1) | #1-5 | 1 problem/day + review |
| Week 2 | Advanced coding (Part 2) | #6-10 | 1 problem/day + review |
| Week 3 | ML implementation + papers | #19-27 | 1 problem/day |
| Week 4 | System design | #11-18 | 1 design/day |
| Week 5 | Optimization + data | #28-35 | 1-2 problems/day |
| Week 6 | Integration + full mocks | All weak areas | Mocks only |
Week 1: Advanced Coding (Multi-Pattern)
Day 1: #1 (Merge K Sorted Lists - min-heap management)
Day 2: #2 (Serialize/Deserialize Tree - encoding design)
Day 3: #3 (Word Search II - Trie + backtracking optimization)
Day 4: #4 (Largest Rectangle - monotonic stack deep dive)
Day 5: #5 (Median of Two Sorted Arrays - binary search on partition)
Day 6-7: Review and re-solve; focus on approach selection, not just coding
Week 2: Advanced Coding (DP & Optimization)
Day 1: #6 (Edit Distance - 2D DP with string alignment)
Day 2: #7 (LIS O(n log n) - patience sorting + binary search)
Day 3: #8 (Regex Matching - complex DP transitions)
Day 4: #9 (Trapping Rain Water - compare 3 approaches)
Day 5: #10 (Concurrent LRU Cache - concurrency primitives)
Day 6-7: Review; time yourself solving #1-10 under 35 min each
Week 3: ML Implementation + Paper Implementation
Day 1: #19 (Backprop - derive all gradients by hand first)
Day 2: #20 (Word2Vec - embedding learning + negative sampling)
Day 3: #21 (Gradient Boosted Trees - residual fitting)
Day 4: #22 (Attention mechanism - matrix dimensions matter)
Day 5: #23 (Transformer block - full forward pass)
Day 6: #24, #25 (PPO core loop, LoRA)
Day 7: #26, #27 (Contrastive loss, Beam search) + review
Week 4: System Design (Large-Scale ML)
Day 1: #11 (Search ranking - the ultimate ML system design)
Day 2: #12 (Ad click prediction - real-time + scale)
Day 3: #13 (Distributed training - parallelism strategies)
Day 4: #14 (Real-time fraud detection - latency constraints)
Day 5: #15 (Knowledge graph - graph + ML combination)
Day 6: #16, #17 (Experimentation platform, feature store)
Day 7: #18 (Multi-region serving) + review all designs
Week 5: Optimization + Data
Day 1: #28 (Serving latency optimization - the profiling-first approach)
Day 2: #29 (Distributed training at 70B scale)
Day 3: #30 (Model compression pipeline)
Day 4: #31 (Sub-10ms feature computation)
Day 5: #32, #33 (Network effect experiments, recursive CTE)
Day 6: #34 (Query optimization on 100TB)
Day 7: #35 (Multi-armed bandit for pricing) + review
Week 6: Integration + Mocks
Day 1: Full mock - hard coding (40 min) + ML depth (40 min)
Day 2: Full mock - system design deep dive (50 min) + behavioral (30 min)
Day 3: Full mock - paper implementation (40 min) + system design (45 min)
Day 4: Targeted review of any problems you scored below 3
Day 5: Full mock - complete senior interview loop (4 hours)
Day 6: Full mock - complete senior interview loop (4 hours)
Day 7: Light review only; rest before the interview
Hard-Tier Patterns Cheat Sheet
| # | Pattern | Problems | When to Apply |
|---|---|---|---|
| 1 | K-way merge with heap | #1 | Merging K sorted streams |
| 2 | Serialization design | #2 | Custom encoding/decoding for trees and graphs |
| 3 | Trie + backtracking | #3 | Multi-pattern search in grids or strings |
| 4 | Monotonic stack | #4 | Next greater/smaller element, histogram problems |
| 5 | Binary search on answer | #5 | Searching partition/position rather than value |
| 6 | 2D dynamic programming | #6, #8 | String alignment, grid optimization |
| 7 | Patience sorting | #7 | Longest increasing subsequence |
| 8 | Multiple approaches | #9 | Two pointers, stack, prefix arrays for same problem |
| 9 | Concurrent data structures | #10 | Thread-safe caches, concurrent queues |
| 10 | Multi-stage ML pipeline | #11, #12 | Retrieval -> ranking -> reranking |
| 11 | Distributed training | #13, #29 | Data/model/pipeline parallelism |
| 12 | Gradient derivation | #19, #20 | Implementing ML from math |
| 13 | Attention mechanism | #22, #23 | QKV projections, masking, multi-head |
| 14 | Clipped objective | #24 | Constrain update magnitude (PPO, TRPO) |
| 15 | Low-rank decomposition | #25 | Reduce parameter count while preserving expressiveness |
| 16 | Contrastive learning | #26 | Learn representations from positive/negative pairs |
| 17 | Latency waterfall | #28, #31 | Profile -> batch -> quantize -> cache |
| 18 | Exploration-exploitation | #35 | Dynamic optimization under uncertainty |
Readiness Assessment
Use this rubric to assess your readiness for hard-tier interviews:
| Skill | Not Ready | Getting There | Ready | Staff-Ready |
|---|---|---|---|---|
| Hard coding | Cannot solve any in 45 min | Solve 40% in 40 min | Solve 70% in 35 min | Solve 85%+ with clean code |
| ML implementation | Cannot derive gradients | Implement with hints | Implement from math alone | Implement + explain teaching-style |
| System design | Miss major components | Cover all components | Deep dive + tradeoffs | Novel insights + cross-system impact |
| Paper implementation | Cannot extract key idea | Gets main idea in 30 min | Implements core loop correctly | Critiques methodology + suggests improvements |
| Statistical reasoning | Cannot set up the problem | Correct framework, shaky execution | Rigorous reasoning | Identifies subtle issues (network effects, Simpson's paradox) |
| Communication | Think silently, present code | Explain after solving | Explain while solving | Teach the problem to the interviewer |
:::tip The Staff+ Differentiator At the hard level, HOW you solve is as important as WHETHER you solve. Staff+ candidates:
- Discuss tradeoffs before committing to an approach
- Identify potential failure modes proactively
- Consider the system-level impact of their design choices
- Ask clarifying questions that reveal deep problem understanding
- Communicate their reasoning clearly throughout
- Connect the problem to real-world experience :::
Common Failure Modes
| Failure Mode | What It Looks Like | How to Fix |
|---|---|---|
| Overcomplicating | Using 3 data structures when 1 suffices | Start with brute force; optimize only as needed |
| Premature optimization | Jumping to O(log n) without understanding the problem | Solve the problem first, then optimize |
| Ignoring constraints | Designing a system without discussing latency, cost, or scale | Write constraints on the board first |
| Not asking questions | Diving in without clarifying ambiguity | Spend 2-3 minutes asking clarifying questions |
| Perfect is the enemy of done | Spending 20 min on edge cases with 10 min left | Get the core solution working, then handle edges |
| Memorizing, not understanding | Can reproduce code but cannot modify it when constraints change | For each solution, try a variation (different constraint, different objective) |
When to Skip Hard Problems
Hard problems are not universally required. Here is a decision matrix:
| Target | Hard DSA | Hard System Design | Hard ML Impl | Hard Paper Impl |
|---|---|---|---|---|
| Google L3-L4 | Optional | No | No | No |
| Google L5+ | Yes | Yes | Yes | Sometimes |
| Meta E3-E4 | Optional | No | No | No |
| Meta E5+ | Yes | Yes | Yes | No |
| OpenAI / Anthropic | Sometimes | Yes | Yes | Yes |
| DeepMind | Sometimes | Yes | Yes | Yes |
| Startup (Series A-B) | Rarely | Yes | Sometimes | No |
| FAANG Data Scientist | Rarely | Sometimes | No | No |
Next Steps
After the Hard Tier:
- Google-Style Problems for Google-specific calibration
- Meta-Style Problems for Meta-specific calibration
- Startup-Style Problems if targeting startups
- Research Engineer Problems if targeting research roles
- Company Guides for company-specific preparation strategies
