Hard Tier Problems

Reading time: ~60 min | Interview relevance: High (FAANG/Staff+) | Roles: Senior MLE, Staff+ Engineer, Research Engineer, ML Architect

You have solved the medium problems. You are comfortable with the standard patterns. You walk into a Google L5+ or Meta E6 interview loop and the interviewer pulls out a problem that makes you realize medium was just the warm-up.

Hard-tier problems are where companies separate senior from staff, and staff from principal. These problems require combining multiple patterns, handling complex constraints, reasoning about scale, and making design decisions under ambiguity. They are not asked to trick you. They are asked because the job itself requires this level of problem-solving.

This list of 35 problems is organized by category and calibrated to what top-tier companies actually ask in senior and staff-level interviews. Each problem takes 30-50 minutes for a well-prepared candidate. If you can solve 70% of these under time pressure, you are ready for any interview.

Who Needs Hard-Tier Preparation

Target	Hard-Tier Coverage Needed	Focus Areas
FAANG L5/E5 (Senior)	40-50%	Coding + System Design
FAANG L6/E6 (Staff)	70-80%	All categories, especially system design
AI Lab (Research Engineer)	60-70%	ML implementation + algorithm depth
OpenAI / Anthropic	70-80%	Paper implementation + ML depth + system design
Unicorn Senior/Staff	50-60%	System design + coding
Non-FAANG Senior	30-40%	Cherry-pick by company

:::warning Hard Problems Require Pattern Mastery Do NOT attempt hard problems before mastering medium patterns. Hard problems combine 2-3 medium patterns into a single problem. Without the building blocks, you will waste time and destroy confidence. Complete the Medium Tier first. :::

Category Distribution

Category	Count	Percentage	Avg Time
Advanced Coding (DSA)	10	29%	35 min
ML System Design	8	23%	48 min
Advanced ML Implementation	5	14%	42 min
Paper Implementation	4	11%	45 min
Optimization & Infrastructure	4	11%	43 min
Advanced Data & Statistics	4	11%	33 min
Total	35		40 min avg

Category 1: Advanced Coding (10 Problems)

These problems combine multiple data structures and algorithms, require careful edge case handling, and often have O(n) solutions hiding behind O(n^2) initial approaches.

Multi-Pattern Problems

#	Problem	Difficulty	Time	Patterns Combined	What Separates Hard from Medium	Company Tags	Handbook Chapter
1	Merge K Sorted Lists	Hard	30 min	Min-heap + linked list merge	K-way merge generalizes 2-way merge; priority queue management	Google, Meta, Amazon	Coding Interviews
2	Serialize and Deserialize Binary Tree	Hard	30 min	BFS/DFS + string encoding	Custom serialization format; handling nulls in tree structure	Google, Meta, Microsoft	Coding Interviews
3	Word Search II (Multiple Words in Grid)	Hard	35 min	Trie + backtracking + pruning	Trie optimization over naive per-word search; early termination	Google, Amazon	Coding Interviews
4	Largest Rectangle in Histogram	Hard	35 min	Monotonic stack	Non-obvious stack application; extending to maximal rectangle in matrix	Google, Meta, Amazon	Coding Interviews
5	Median of Two Sorted Arrays	Hard	40 min	Binary search on partition	True O(log(min(m,n))); binary search on answer space, not data	Google, Amazon	Coding Interviews

Dynamic Programming & Optimization

#	Problem	Difficulty	Time	Patterns Combined	What Separates Hard from Medium	Company Tags	Handbook Chapter
6	Edit Distance	Hard	30 min	2D DP + string alignment	2D state space; sequence alignment used in bioinformatics and NLP	Google, Meta	Coding Interviews
7	Longest Increasing Subsequence (O(n log n))	Hard	30 min	Binary search + patience sorting	The O(n log n) solution using binary search on tails array	FAANG, All	Coding Interviews
8	Regular Expression Matching	Hard	40 min	2D DP + recursive case analysis	Complex state transitions with `*` and `.`; many edge cases	Google, Meta	Coding Interviews
9	Trapping Rain Water	Hard	30 min	Two pointers or monotonic stack	Multiple valid approaches; interviewers expect discussion of all	FAANG, All	Coding Interviews
10	Design a Concurrent LRU Cache	Hard	35 min	LRU + concurrency primitives	Thread safety, lock striping, read-write locks for model serving cache	Google, Meta, Uber	Coding Interviews

:::tip Hard Coding Problem Strategy

Spend 5 minutes on approach before coding. For hard problems, the wrong approach wastes 30 minutes.
Identify which medium patterns combine. Largest Rectangle = Monotonic Stack. Word Search II = Trie + Backtracking.
Start with brute force, then optimize. Interviewers want to see your optimization reasoning.
Handle edge cases explicitly. Hard problems always have tricky edge cases (empty input, single element, all duplicates). :::

Category 2: ML System Design (8 Problems)

Hard system design problems require reasoning about scale, multi-component architecture, and non-obvious tradeoffs. These are Staff+ level questions.

Large-Scale ML Systems

#	Problem	Difficulty	Time	Key Concept	What Makes It Hard	Company Tags	Handbook Chapter
11	Design a Search Ranking System	Hard	50 min	Query understanding + retrieval + L2R	Multi-stage pipeline, learning-to-rank, position bias correction	Google, Amazon, Airbnb	ML System Design
12	Design an Ad Click Prediction System	Hard	50 min	Feature engineering + real-time prediction	Real-time features, calibration, bid optimization, revenue constraints	Google, Meta, Amazon	ML System Design
13	Design a Distributed Training System	Hard	45 min	Data/model parallelism, gradient aggregation	AllReduce vs. parameter server, gradient compression, fault tolerance	Google, Meta, AI Labs	ML System Design
14	Design a Real-Time Fraud Detection System	Hard	45 min	Streaming features, low-latency scoring	Sub-100ms latency, evolving fraud patterns, label delay	Stripe, Square, PayPal	ML System Design

ML Infrastructure

#	Problem	Difficulty	Time	Key Concept	What Makes It Hard	Company Tags	Handbook Chapter
15	Design a Large-Scale Knowledge Graph System	Hard	50 min	Graph storage, entity resolution, inference	Billions of entities, relation extraction, query efficiency	Google, Amazon, Microsoft	ML System Design
16	Design a Model Experimentation Platform	Hard	45 min	Experiment tracking, reproducibility	Multi-tenant, versioning, configuration management at scale	FAANG, Unicorns	ML System Design
17	Design a Feature Store for Online and Offline Serving	Hard	45 min	Feature consistency, low-latency serving	Training-serving skew, point-in-time correctness, feature freshness	Uber, Airbnb, Databricks	ML System Design
18	Design a Multi-Region ML Serving System	Hard	45 min	Global deployment, model sync	Consistency across regions, failover, latency optimization	Google, Meta, Amazon	ML System Design

:::note Staff+ System Design Evaluation Criteria At the hard level, interviewers evaluate:

Architectural vision: Can you reason about the system holistically?
Tradeoff articulation: Can you explain WHY you chose one approach over another?
Failure mode analysis: What happens when each component fails?
Scale reasoning: How does cost/complexity grow with 10x, 100x scale?
Cross-team impact: How does your system affect other teams and services? :::

Category 3: Advanced ML Implementation (5 Problems)

These problems require deep understanding of ML algorithms and the ability to implement them from mathematical foundations.

#	Problem	Difficulty	Time	Key Concept	What Makes It Hard	Company Tags	Handbook Chapter
19	Implement Backpropagation for a 2-Layer Neural Network	Hard	45 min	Chain rule, gradient computation	Correct derivative computation for every layer; numerical stability	FAANG, AI Labs	Deep Learning
20	Implement Word2Vec (Skip-gram with Negative Sampling)	Hard	40 min	Embedding learning, contrastive loss	Negative sampling efficiency; gradient updates for large vocabularies	Google, Meta, Airbnb	Deep Learning
21	Implement Gradient Boosted Trees (Training + Prediction)	Hard	40 min	Sequential ensemble, residual fitting	Residual computation, learning rate shrinkage, regularization	Google, Amazon, Big Tech	ML Fundamentals
22	Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)	Hard	45 min	Query-Key-Value, softmax attention	Matrix dimensions, masking, multi-head extension	AI Labs, Google, Meta	LLM Interviews
23	Implement a Simple Transformer Block (Self-Attention + FFN + LayerNorm)	Hard	50 min	Multi-head attention, residual connections	Full forward pass with correct dimensions; residual + norm order	AI Labs, Google, Meta	LLM Interviews

:::danger ML Implementation Red Flags at the Hard Level

Cannot derive gradients by hand (even for simple functions)
Does not understand why batch normalization has different behavior at train vs. inference time
Cannot explain the difference between attention, self-attention, and cross-attention
Implements algorithms without considering numerical stability (log-sum-exp, gradient clipping) :::

Category 4: Paper Implementation (4 Problems)

Research engineer interviews at AI labs often ask you to implement a key idea from a paper. The goal is not perfect reproduction but demonstrating that you can read a paper and translate math into working code.

#	Problem	Difficulty	Time	Key Concept	What Makes It Hard	Company Tags	Handbook Chapter
24	Implement the Core Loop of PPO (Proximal Policy Optimization)	Hard	50 min	Clipped surrogate objective + advantage estimation	RL optimization; the algorithm behind RLHF	OpenAI, Anthropic, DeepMind	Paper Discussion
25	Implement LoRA (Low-Rank Adaptation) for a Linear Layer	Hard	35 min	Low-rank decomposition + frozen weights	Parameter-efficient fine-tuning; rank selection tradeoffs	OpenAI, Anthropic, Google, Meta	LLM Interviews
26	Implement the Contrastive Loss (SimCLR / CLIP-style)	Hard	40 min	Temperature-scaled cross-entropy over similarity matrix	Self-supervised learning; hard negative mining; batch size effects	Google, OpenAI, Meta	Deep Learning
27	Implement Beam Search for Sequence Decoding	Hard	35 min	Top-K expansion, pruning	Maintaining K hypotheses, score normalization, early stopping	Google, Meta, AI Labs	LLM Interviews

:::tip Paper Implementation Strategy When implementing a paper's algorithm in an interview:

State the key insight of the paper in 2 sentences
Write the loss function or update rule on the whiteboard first
Implement the core loop --- not the full paper, just the novel part
Verify with a toy example --- does the loss decrease? Does the output make sense?
Discuss ablations --- what happens if you remove the key component?

You do not need to memorize entire papers. You need to understand the one idea that makes each paper novel and implement it correctly. :::

Category 5: Optimization & Infrastructure (4 Problems)

These problems test your ability to make ML systems fast, efficient, and production-ready.

#	Problem	Difficulty	Time	Key Concept	What Makes It Hard	Company Tags	Handbook Chapter
28	Optimize Model Serving Latency from 200ms to 50ms	Hard	45 min	Profiling + batching + quantization + caching	End-to-end latency optimization; knowing which knob to turn first	Google, Meta, Amazon, Uber	ML System Design
29	Design a Distributed Training Pipeline for a 70B Parameter Model	Hard	50 min	Data + tensor + pipeline parallelism	Multi-GPU and multi-node training; communication overhead management	Google, Meta, OpenAI, Anthropic	Deep Learning
30	Implement a Model Compression Pipeline (Quantization + Pruning + Distillation)	Hard	45 min	Accuracy-latency tradeoff analysis	Deployment on edge devices; which technique for which scenario	Google, Apple, Qualcomm, Meta	ML System Design
31	Design a Feature Computation System with Sub-10ms Latency	Hard	45 min	Precomputation + caching + streaming updates	Online feature serving at scale; consistency guarantees	Uber, Airbnb, Databricks, Tecton	ML System Design

Category 6: Advanced Data & Statistics (4 Problems)

Senior data roles and MLE roles at data-heavy companies test advanced SQL and statistical reasoning.

#	Problem	Difficulty	Time	Key Concept	What Makes It Hard	Company Tags	Handbook Chapter
32	Design an Online Experiment with Network Effects	Hard	35 min	Cluster randomization, interference	Standard A/B testing breaks with network effects (social, marketplace)	Meta, LinkedIn, Uber	ML Fundamentals
33	Recursive CTE for Hierarchical Data Processing	Hard	30 min	Recursive SQL, tree traversal	Org charts, category trees with depth limit and cycle detection	Google, Amazon, Snowflake	Coding Interviews
34	Optimize a Slow Query on a 100TB Table	Hard	30 min	Partitioning, clustering, materialized views	Understanding query plans, predicate pushdown, storage layout	Google, Amazon, Snowflake	Coding Interviews
35	Design a Multi-Armed Bandit for Dynamic Pricing	Hard	40 min	Thompson sampling, UCB, regret bounds	Exploration-exploitation with non-stationary rewards	Uber, Airbnb, DoorDash	ML Fundamentals

Problem Deep Dives

Problem 1: Merge K Sorted Lists

Why this problem matters: It tests heap management, linked list manipulation, and the generalization from 2-way merge to K-way merge. This exact operation appears in distributed systems (merging sorted streams from K workers).

Three Approaches:

Approach	Time	Space	When to Use
Brute force (concatenate + sort)	O(N log N)	O(N)	Never in interviews --- mention as baseline only
Merge pairs iteratively	O(N * K)	O(1)	If K is small and constant
Min-heap	O(N log K)	O(K)	The expected answer for interviews

Min-Heap Solution:

import heapq

def merge_k_lists(lists):
    heap = []
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst.val, i, lst))

    dummy = ListNode(0)
    current = dummy

    while heap:
        val, i, node = heapq.heappop(heap)
        current.next = node
        current = current.next
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))

    return dummy.next

Complexity:

Time: O(N log K) where N is total nodes, K is number of lists
Space: O(K) for the heap

Key Points Interviewers Check:

Using index i as tiebreaker in the heap (nodes may not be comparable)
Heap size never exceeds K
Generalizes to merging K sorted database result sets
Can you discuss the distributed version? (Each list on a different machine)

Problem 5: Median of Two Sorted Arrays

Why this problem matters: This is the hardest binary search problem you will encounter. It tests whether you truly understand binary search beyond "find an element in a sorted array." The O(log(min(m,n))) solution requires partitioning both arrays simultaneously.

Approach:

Binary search on the smaller array
Partition both arrays such that left_half has (m+n+1)//2 elements
Check if partition is valid: max(left) <= min(right) for both arrays
If valid, median is derivable from the boundary elements

Solution:

def find_median_sorted_arrays(nums1, nums2):
    # Ensure nums1 is the smaller array
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1

    m, n = len(nums1), len(nums2)
    lo, hi = 0, m

    while lo <= hi:
        i = (lo + hi) // 2          # partition in nums1
        j = (m + n + 1) // 2 - i    # partition in nums2

        left1 = nums1[i - 1] if i > 0 else float('-inf')
        right1 = nums1[i] if i < m else float('inf')
        left2 = nums2[j - 1] if j > 0 else float('-inf')
        right2 = nums2[j] if j < n else float('inf')

        if left1 <= right2 and left2 <= right1:
            if (m + n) % 2 == 0:
                return (max(left1, left2) + min(right1, right2)) / 2
            else:
                return max(left1, left2)
        elif left1 > right2:
            hi = i - 1
        else:
            lo = i + 1

Key Points Interviewers Check:

Why binary search on the smaller array (reduces search space)
Correct handling of boundary conditions (partition at 0 or m)
Understanding of the partition invariant (all left elements <= all right elements)
Time complexity proof: O(log(min(m,n)))

Problem 9: Trapping Rain Water

Why this problem matters: This problem has three valid approaches (two pointers, monotonic stack, prefix max arrays), and interviewers expect you to discuss all three before coding one. It tests your ability to evaluate tradeoffs, not just solve the problem.

Three Approaches:

Approach	Time	Space	Key Idea
Prefix max arrays	O(n)	O(n)	Precompute left_max and right_max for each index
Monotonic stack	O(n)	O(n)	Track boundaries of "pools" using a decreasing stack
Two pointers	O(n)	O(1)	Move from both ends; the smaller side is the bottleneck

Two-Pointer Solution (Optimal):

def trap(height):
    if not height:
        return 0

    left, right = 0, len(height) - 1
    left_max, right_max = height[left], height[right]
    water = 0

    while left < right:
        if left_max < right_max:
            left += 1
            left_max = max(left_max, height[left])
            water += left_max - height[left]
        else:
            right -= 1
            right_max = max(right_max, height[right])
            water += right_max - height[right]

    return water

Key Insight: At each step, the water level at a position is determined by the minimum of left_max and right_max. By always processing the side with the smaller max, we know the water level at that position without knowing the exact max on the other side.

Problem 11: Design a Search Ranking System

Why this problem matters: Search ranking is the most comprehensive ML system design problem. It combines information retrieval, ML, distributed systems, and product thinking into a single problem.

Architecture:

Query -> Query Understanding -> Retrieval -> Ranking -> Reranking -> Results

1. Query Understanding (10ms budget)
   - Spell correction (learned edit distance model)
   - Query expansion (synonyms, related terms)
   - Intent classification (navigational vs. informational)

2. Retrieval (~1000 candidates, 50ms budget)
   - Inverted index (BM25) for lexical matching
   - Embedding retrieval (ANN with HNSW) for semantic matching
   - Combine with reciprocal rank fusion

3. Ranking (1000 -> 100, 100ms budget)
   - Features: query-doc relevance, freshness, authority, personalization
   - Model: LambdaMART or cross-encoder neural ranker
   - Training: click data with position bias correction (IPW)

4. Reranking (100 -> final, 20ms budget)
   - Diversity injection (MMR)
   - Business rules (ads, promoted, freshness boost)
   - Personalization overlay

5. Monitoring
   - Offline: NDCG@10, MAP, MRR
   - Online: CTR, query success rate, time-to-click
   - Degradation alert: compare online metrics to baseline

Key Tradeoffs to Discuss:

Precision vs. recall in retrieval stage
Pointwise vs. pairwise vs. listwise ranking loss
Position bias in click data (clicks are biased toward top positions)
Latency budget allocation across stages

Problem 19: Implement Backpropagation for a 2-Layer Neural Network

Why this problem matters: Backpropagation is the foundation of all deep learning. If you cannot implement it from scratch, you do not understand what your framework is doing. AI labs and senior roles at FAANG test this directly.

Code Template:

import numpy as np

class TwoLayerNet:
    def __init__(self, input_dim, hidden_dim, output_dim):
        # Xavier initialization
        self.W1 = np.random.randn(input_dim, hidden_dim) * np.sqrt(2.0 / input_dim)
        self.b1 = np.zeros(hidden_dim)
        self.W2 = np.random.randn(hidden_dim, output_dim) * np.sqrt(2.0 / hidden_dim)
        self.b2 = np.zeros(output_dim)

    def relu(self, x):
        return np.maximum(0, x)

    def relu_grad(self, x):
        return (x > 0).astype(float)

    def softmax(self, x):
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)

    def forward(self, X):
        self.z1 = X @ self.W1 + self.b1           # (N, H)
        self.a1 = self.relu(self.z1)                # (N, H)
        self.z2 = self.a1 @ self.W2 + self.b2      # (N, C)
        self.probs = self.softmax(self.z2)          # (N, C)
        return self.probs

    def backward(self, X, y, lr=0.01):
        N = X.shape[0]

        # Output layer gradient
        dz2 = self.probs.copy()
        dz2[range(N), y] -= 1                      # (N, C)
        dz2 /= N

        dW2 = self.a1.T @ dz2                      # (H, C)
        db2 = np.sum(dz2, axis=0)                   # (C,)

        # Hidden layer gradient
        da1 = dz2 @ self.W2.T                       # (N, H)
        dz1 = da1 * self.relu_grad(self.z1)         # (N, H)

        dW1 = X.T @ dz1                             # (D, H)
        db1 = np.sum(dz1, axis=0)                   # (H,)

        # Update weights
        self.W2 -= lr * dW2
        self.b2 -= lr * db2
        self.W1 -= lr * dW1
        self.b1 -= lr * db1

Key Points Interviewers Check:

Correct gradient shapes at every step (comment the shapes)
Numerically stable softmax (subtract max before exp)
Proper initialization (Xavier or He, not uniform random)
ReLU gradient is 0 for negative inputs (not undefined)
The 1/N normalization in the gradient (average over batch)

Problem 22: Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)

Why this problem matters: This is the single most important implementation question for anyone working in modern AI. Every LLM, vision transformer, and multimodal model uses multi-head attention. AI labs will ask you to implement this from memory.

Code Template:

import numpy as np

def softmax(x, axis=-1):
    x_max = np.max(x, axis=axis, keepdims=True)
    exp_x = np.exp(x - x_max)
    return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

def scaled_dot_product_attention(Q, K, V, mask=None):
    """
    Q, K, V: (batch, heads, seq_len, d_k)
    mask: (batch, 1, 1, seq_len) or (batch, 1, seq_len, seq_len)
    """
    d_k = Q.shape[-1]
    scores = Q @ K.transpose(0, 1, 3, 2) / np.sqrt(d_k)

    if mask is not None:
        scores = np.where(mask == 0, -1e9, scores)

    attention_weights = softmax(scores, axis=-1)
    output = attention_weights @ V
    return output, attention_weights

class MultiHeadAttention:
    def __init__(self, d_model, n_heads):
        self.n_heads = n_heads
        self.d_k = d_model // n_heads
        assert d_model % n_heads == 0

        self.W_q = np.random.randn(d_model, d_model) * 0.02
        self.W_k = np.random.randn(d_model, d_model) * 0.02
        self.W_v = np.random.randn(d_model, d_model) * 0.02
        self.W_o = np.random.randn(d_model, d_model) * 0.02

    def forward(self, x, mask=None):
        batch, seq_len, d_model = x.shape

        Q = x @ self.W_q
        K = x @ self.W_k
        V = x @ self.W_v

        # Reshape to (batch, heads, seq, d_k)
        Q = Q.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
        K = K.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)
        V = V.reshape(batch, seq_len, self.n_heads, self.d_k).transpose(0, 2, 1, 3)

        output, weights = scaled_dot_product_attention(Q, K, V, mask)

        # Concatenate heads
        output = output.transpose(0, 2, 1, 3).reshape(batch, seq_len, d_model)
        return output @ self.W_o

Key Points Interviewers Check:

Correct scaling by sqrt(d_k) (prevents softmax saturation)
Proper reshape and transpose for multi-head splitting
Correct mask application (causal mask for autoregressive, padding mask for batched sequences)
Numerically stable softmax (subtract max before exp)
Can you explain why multi-head is better than single-head? (Different representation subspaces)

Problem 25: Implement LoRA (Low-Rank Adaptation)

Why this problem matters: LoRA is the dominant parameter-efficient fine-tuning method. Every company fine-tuning LLMs uses it. This tests whether you understand why low-rank adaptation works and can implement it correctly.

Code Template:

import numpy as np

class LoRALinear:
    """
    Wraps a frozen linear layer with low-rank adaptation.
    W_effective = W_frozen + (alpha/r) * B @ A
    """
    def __init__(self, in_features, out_features, rank=4, alpha=1.0):
        # Frozen pretrained weights
        self.W = np.random.randn(out_features, in_features) * 0.02
        self.W_frozen = True  # Do not update during training

        # LoRA matrices
        self.A = np.random.randn(rank, in_features) * 0.02  # Down-projection
        self.B = np.zeros((out_features, rank))               # Up-projection (init to zero)
        self.alpha = alpha
        self.rank = rank
        self.scaling = alpha / rank

    def forward(self, x):
        base_output = x @ self.W.T
        lora_output = (x @ self.A.T) @ self.B.T * self.scaling
        return base_output + lora_output

    def trainable_params(self):
        return self.A.size + self.B.size

    def total_params(self):
        return self.W.size + self.A.size + self.B.size

    def compression_ratio(self):
        return self.trainable_params() / self.total_params()

Key Points Interviewers Check:

B is initialized to zero (LoRA starts as identity --- no change from pretrained behavior)
A is initialized with random values (Kaiming or small Gaussian)
Scaling factor alpha/r (controls the magnitude of LoRA contribution)
Can you explain rank selection? (Higher rank = more expressiveness, more params, risk of overfitting)
Can you discuss when LoRA fails? (Tasks very different from pretraining distribution may need full fine-tuning)

Problem 28: Optimize Model Serving Latency

Why this problem matters: Every production ML team faces this problem. Moving from "model works" to "model serves in 50ms" requires understanding the full serving stack.

Optimization Waterfall:

Step	Technique	Expected Speedup	Effort	When to Use
1	Profile first	N/A (diagnostic)	Low	Always --- never optimize without profiling
2	Batching	2-5x throughput	Low	When individual requests are small
3	Model quantization (INT8)	2-4x latency	Medium	When accuracy loss is <0.5%
4	ONNX/TensorRT export	1.5-3x latency	Medium	When running on GPU
5	Caching	10-100x for cache hits	Low	When inputs have high repetition rate
6	Model distillation	2-10x latency	High	When you have a large teacher model
7	Pruning	1.5-3x latency	High	Structured pruning for real speedups
8	Async preprocessing	1.2-2x	Medium	When preprocessing is a bottleneck

Structured Discussion:

Where is the latency? (Profile: preprocessing, inference, postprocessing, network)
What is the accuracy-latency budget? (Can we lose 0.5% accuracy for 4x speed?)
Quick wins first: batching, caching, ONNX export
Medium effort: quantization, async preprocessing
High effort if needed: distillation, architecture changes
Monitoring: track p50, p95, p99 latency separately

6-Week Hard Tier Study Plan

Hard problems require more time per problem and more review. Budget 2-3 hours per day.

Week	Focus	Problems	Daily Load
Week 1	Advanced coding (Part 1)	#1-5	1 problem/day + review
Week 2	Advanced coding (Part 2)	#6-10	1 problem/day + review
Week 3	ML implementation + papers	#19-27	1 problem/day
Week 4	System design	#11-18	1 design/day
Week 5	Optimization + data	#28-35	1-2 problems/day
Week 6	Integration + full mocks	All weak areas	Mocks only

Week 1: Advanced Coding (Multi-Pattern)

Day 1: #1 (Merge K Sorted Lists - min-heap management)
Day 2: #2 (Serialize/Deserialize Tree - encoding design)
Day 3: #3 (Word Search II - Trie + backtracking optimization)
Day 4: #4 (Largest Rectangle - monotonic stack deep dive)
Day 5: #5 (Median of Two Sorted Arrays - binary search on partition)
Day 6-7: Review and re-solve; focus on approach selection, not just coding

Week 2: Advanced Coding (DP & Optimization)

Day 1: #6 (Edit Distance - 2D DP with string alignment)
Day 2: #7 (LIS O(n log n) - patience sorting + binary search)
Day 3: #8 (Regex Matching - complex DP transitions)
Day 4: #9 (Trapping Rain Water - compare 3 approaches)
Day 5: #10 (Concurrent LRU Cache - concurrency primitives)
Day 6-7: Review; time yourself solving #1-10 under 35 min each

Week 3: ML Implementation + Paper Implementation

Day 1: #19 (Backprop - derive all gradients by hand first)
Day 2: #20 (Word2Vec - embedding learning + negative sampling)
Day 3: #21 (Gradient Boosted Trees - residual fitting)
Day 4: #22 (Attention mechanism - matrix dimensions matter)
Day 5: #23 (Transformer block - full forward pass)
Day 6: #24, #25 (PPO core loop, LoRA)
Day 7: #26, #27 (Contrastive loss, Beam search) + review

Week 4: System Design (Large-Scale ML)

Day 1: #11 (Search ranking - the ultimate ML system design)
Day 2: #12 (Ad click prediction - real-time + scale)
Day 3: #13 (Distributed training - parallelism strategies)
Day 4: #14 (Real-time fraud detection - latency constraints)
Day 5: #15 (Knowledge graph - graph + ML combination)
Day 6: #16, #17 (Experimentation platform, feature store)
Day 7: #18 (Multi-region serving) + review all designs

Week 5: Optimization + Data

Day 1: #28 (Serving latency optimization - the profiling-first approach)
Day 2: #29 (Distributed training at 70B scale)
Day 3: #30 (Model compression pipeline)
Day 4: #31 (Sub-10ms feature computation)
Day 5: #32, #33 (Network effect experiments, recursive CTE)
Day 6: #34 (Query optimization on 100TB)
Day 7: #35 (Multi-armed bandit for pricing) + review

Week 6: Integration + Mocks

Day 1: Full mock - hard coding (40 min) + ML depth (40 min)
Day 2: Full mock - system design deep dive (50 min) + behavioral (30 min)
Day 3: Full mock - paper implementation (40 min) + system design (45 min)
Day 4: Targeted review of any problems you scored below 3
Day 5: Full mock - complete senior interview loop (4 hours)
Day 6: Full mock - complete senior interview loop (4 hours)
Day 7: Light review only; rest before the interview

Hard-Tier Patterns Cheat Sheet

#	Pattern	Problems	When to Apply
1	K-way merge with heap	#1	Merging K sorted streams
2	Serialization design	#2	Custom encoding/decoding for trees and graphs
3	Trie + backtracking	#3	Multi-pattern search in grids or strings
4	Monotonic stack	#4	Next greater/smaller element, histogram problems
5	Binary search on answer	#5	Searching partition/position rather than value
6	2D dynamic programming	#6, #8	String alignment, grid optimization
7	Patience sorting	#7	Longest increasing subsequence
8	Multiple approaches	#9	Two pointers, stack, prefix arrays for same problem
9	Concurrent data structures	#10	Thread-safe caches, concurrent queues
10	Multi-stage ML pipeline	#11, #12	Retrieval -> ranking -> reranking
11	Distributed training	#13, #29	Data/model/pipeline parallelism
12	Gradient derivation	#19, #20	Implementing ML from math
13	Attention mechanism	#22, #23	QKV projections, masking, multi-head
14	Clipped objective	#24	Constrain update magnitude (PPO, TRPO)
15	Low-rank decomposition	#25	Reduce parameter count while preserving expressiveness
16	Contrastive learning	#26	Learn representations from positive/negative pairs
17	Latency waterfall	#28, #31	Profile -> batch -> quantize -> cache
18	Exploration-exploitation	#35	Dynamic optimization under uncertainty

Readiness Assessment

Use this rubric to assess your readiness for hard-tier interviews:

Skill	Not Ready	Getting There	Ready	Staff-Ready
Hard coding	Cannot solve any in 45 min	Solve 40% in 40 min	Solve 70% in 35 min	Solve 85%+ with clean code
ML implementation	Cannot derive gradients	Implement with hints	Implement from math alone	Implement + explain teaching-style
System design	Miss major components	Cover all components	Deep dive + tradeoffs	Novel insights + cross-system impact
Paper implementation	Cannot extract key idea	Gets main idea in 30 min	Implements core loop correctly	Critiques methodology + suggests improvements
Statistical reasoning	Cannot set up the problem	Correct framework, shaky execution	Rigorous reasoning	Identifies subtle issues (network effects, Simpson's paradox)
Communication	Think silently, present code	Explain after solving	Explain while solving	Teach the problem to the interviewer

:::tip The Staff+ Differentiator At the hard level, HOW you solve is as important as WHETHER you solve. Staff+ candidates:

Discuss tradeoffs before committing to an approach
Identify potential failure modes proactively
Consider the system-level impact of their design choices
Ask clarifying questions that reveal deep problem understanding
Communicate their reasoning clearly throughout
Connect the problem to real-world experience :::

Common Failure Modes

Failure Mode	What It Looks Like	How to Fix
Overcomplicating	Using 3 data structures when 1 suffices	Start with brute force; optimize only as needed
Premature optimization	Jumping to O(log n) without understanding the problem	Solve the problem first, then optimize
Ignoring constraints	Designing a system without discussing latency, cost, or scale	Write constraints on the board first
Not asking questions	Diving in without clarifying ambiguity	Spend 2-3 minutes asking clarifying questions
Perfect is the enemy of done	Spending 20 min on edge cases with 10 min left	Get the core solution working, then handle edges
Memorizing, not understanding	Can reproduce code but cannot modify it when constraints change	For each solution, try a variation (different constraint, different objective)

When to Skip Hard Problems

Hard problems are not universally required. Here is a decision matrix:

Target	Hard DSA	Hard System Design	Hard ML Impl	Hard Paper Impl
Google L3-L4	Optional	No	No	No
Google L5+	Yes	Yes	Yes	Sometimes
Meta E3-E4	Optional	No	No	No
Meta E5+	Yes	Yes	Yes	No
OpenAI / Anthropic	Sometimes	Yes	Yes	Yes
DeepMind	Sometimes	Yes	Yes	Yes
Startup (Series A-B)	Rarely	Yes	Sometimes	No
FAANG Data Scientist	Rarely	Sometimes	No	No

Next Steps

After the Hard Tier:

Google-Style Problems for Google-specific calibration
Meta-Style Problems for Meta-specific calibration
Startup-Style Problems if targeting startups
Research Engineer Problems if targeting research roles
Company Guides for company-specific preparation strategies

Who Needs Hard-Tier Preparation​

Category Distribution​

Category 1: Advanced Coding (10 Problems)​

Multi-Pattern Problems​

Dynamic Programming & Optimization​

Category 2: ML System Design (8 Problems)​

Large-Scale ML Systems​

ML Infrastructure​

Category 3: Advanced ML Implementation (5 Problems)​

Category 4: Paper Implementation (4 Problems)​

Category 5: Optimization & Infrastructure (4 Problems)​

Category 6: Advanced Data & Statistics (4 Problems)​

Problem Deep Dives​

Problem 1: Merge K Sorted Lists​

Problem 5: Median of Two Sorted Arrays​

Problem 9: Trapping Rain Water​

Problem 11: Design a Search Ranking System​

Problem 19: Implement Backpropagation for a 2-Layer Neural Network​

Problem 22: Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)​

Problem 25: Implement LoRA (Low-Rank Adaptation)​

Problem 28: Optimize Model Serving Latency​

6-Week Hard Tier Study Plan​

Week 1: Advanced Coding (Multi-Pattern)​

Week 2: Advanced Coding (DP & Optimization)​

Week 3: ML Implementation + Paper Implementation​

Week 4: System Design (Large-Scale ML)​

Week 5: Optimization + Data​

Week 6: Integration + Mocks​

Hard-Tier Patterns Cheat Sheet​

Readiness Assessment​

Common Failure Modes​

When to Skip Hard Problems​

Next Steps​

Who Needs Hard-Tier Preparation

Category Distribution

Category 1: Advanced Coding (10 Problems)

Multi-Pattern Problems

Dynamic Programming & Optimization

Category 2: ML System Design (8 Problems)

Large-Scale ML Systems

ML Infrastructure

Category 3: Advanced ML Implementation (5 Problems)

Category 4: Paper Implementation (4 Problems)

Category 5: Optimization & Infrastructure (4 Problems)

Category 6: Advanced Data & Statistics (4 Problems)

Problem Deep Dives

Problem 1: Merge K Sorted Lists

Problem 5: Median of Two Sorted Arrays

Problem 9: Trapping Rain Water

Problem 11: Design a Search Ranking System

Problem 19: Implement Backpropagation for a 2-Layer Neural Network

Problem 22: Implement Attention Mechanism (Scaled Dot-Product + Multi-Head)

Problem 25: Implement LoRA (Low-Rank Adaptation)

Problem 28: Optimize Model Serving Latency

6-Week Hard Tier Study Plan

Week 1: Advanced Coding (Multi-Pattern)

Week 2: Advanced Coding (DP & Optimization)

Week 3: ML Implementation + Paper Implementation

Week 4: System Design (Large-Scale ML)

Week 5: Optimization + Data

Week 6: Integration + Mocks

Hard-Tier Patterns Cheat Sheet

Readiness Assessment

Common Failure Modes

When to Skip Hard Problems

Next Steps