Research Engineer Problem List
Reading time: ~40 min | Interview relevance: Critical | Roles: Research Engineer, Research Scientist, Applied Research Scientist, ML Research Engineer
A Research Engineer at a top AI lab opens your interview with: "Here is a paper we published last month. Read the abstract and Section 3. You have 45 minutes to implement the core algorithm." No LeetCode. No system design templates. Just you, a whiteboard (or laptop), and the mathematical heart of a new method. If that sounds exciting rather than terrifying, this role is for you.
Research Engineer interviews are the most technically demanding in AI/ML. They test deep mathematical understanding, algorithm implementation from papers, and the ability to critically evaluate research. This list of 45 problems prepares you for all four dimensions: paper implementation, mathematical reasoning, algorithm coding, and research taste.
Research Engineer Interview Structure
| Round | Duration | What They Test | Weight |
|---|---|---|---|
| Algorithm Implementation | 60-90 min | Code an algorithm from a paper or description | 30-35% |
| Math & Theory | 45-60 min | Probability, linear algebra, optimization, information theory | 20-25% |
| Paper Discussion | 45-60 min | Critically analyze a paper, propose improvements | 20-25% |
| Coding | 45-60 min | Strong CS fundamentals, DSA | 15-20% |
| Research Taste | 30-45 min | What problems matter? Where is the field going? | 5-10% |
:::tip The Research Engineer Bar Research Engineers are expected to bridge the gap between mathematical ideas and working code. You must be comfortable reading equations, understanding the intuition behind them, and implementing them efficiently. This is a rare combination. :::
Section 1: Paper Implementation (12 Problems)
These problems simulate the most distinctive part of Research Engineer interviews: implementing algorithms from paper descriptions.
Core Algorithm Implementations
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 1 | Implement Multi-Head Self-Attention | Hard | 40 min | Scaled dot-product attention, head splitting, concatenation | The foundation of Transformers; must be second nature | DeepMind, Google, OpenAI, Anthropic |
| 2 | Implement Byte-Pair Encoding (BPE) Tokenizer | Medium | 30 min | Iterative merge of frequent pairs | Core NLP preprocessing; appears in every LLM paper | OpenAI, Google, Meta |
| 3 | Implement Beam Search Decoding | Medium | 30 min | Breadth-limited search, log probability accumulation | Standard decoding strategy for sequence models | Google, Meta, AI Labs |
| 4 | Implement a Simple GAN Training Loop | Hard | 40 min | Alternating optimization, generator/discriminator interplay | Tests understanding of adversarial training dynamics | DeepMind, OpenAI, Meta |
| 5 | Implement Contrastive Learning (SimCLR-style) | Hard | 40 min | Data augmentation, projection head, NT-Xent loss | Self-supervised learning is a major research direction | Google, Meta, DeepMind |
| 6 | Implement REINFORCE Policy Gradient | Hard | 35 min | Log probability trick, baseline subtraction, variance reduction | Foundation of RLHF and RL research | DeepMind, OpenAI, Anthropic |
Advanced Implementations
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 7 | Implement Flash Attention (Simplified) | Hard | 45 min | Tiled computation, memory-efficient attention | Critical optimization for large-scale Transformers | AI Labs |
| 8 | Implement LoRA (Low-Rank Adaptation) | Medium | 30 min | Low-rank decomposition of weight updates | Standard parameter-efficient fine-tuning | AI Labs, Big Tech |
| 9 | Implement Rotary Position Embeddings (RoPE) | Hard | 35 min | Rotation matrices, relative position encoding | Used in most modern LLMs | AI Labs |
| 10 | Implement a VAE (Variational Autoencoder) | Hard | 40 min | Reparameterization trick, ELBO, KL divergence | Core generative model; tests probabilistic ML depth | DeepMind, OpenAI, Meta |
| 11 | Implement Group Query Attention (GQA) | Medium | 25 min | Key-value head sharing, memory reduction | Efficiency technique in modern Transformers | Google, Meta |
| 12 | Implement DDPM Noise Schedule and Forward Process | Hard | 35 min | Gaussian noise addition, noise schedule, variance schedule | Diffusion models are a major research area | Google, OpenAI, Stability AI |
:::warning Paper Implementation Tips
- Read the math first, then the code. Do not jump to implementation.
- Identify the core computation (usually 3-5 lines of math).
- Implement a naive version first, then optimize.
- Always verify dimensions with small examples.
- Comment your code with equation references. :::
Section 2: Mathematical Reasoning (12 Problems)
These problems test your ability to reason mathematically about ML concepts -- a core requirement for research roles.
Linear Algebra & Optimization
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 13 | Derive the Gradient of Softmax Cross-Entropy Loss | Medium | 25 min | Chain rule, Jacobian of softmax | Every neural network uses this; derivation tests understanding | All AI Labs |
| 14 | Prove That SVD Gives the Best Low-Rank Approximation | Hard | 30 min | Eckart-Young theorem, Frobenius norm minimization | Foundation of dimensionality reduction and compression | DeepMind, Google |
| 15 | Derive the Update Rules for Adam Optimizer | Medium | 25 min | Exponential moving averages, bias correction | Most common optimizer; understanding internals matters | All |
| 16 | Explain and Derive the Reparameterization Trick | Medium | 25 min | Pathwise gradient estimation, sampling from distributions | Critical for VAEs and stochastic computation graphs | DeepMind, OpenAI, Meta |
| 17 | Derive the Gradient of Attention with Respect to Queries | Hard | 30 min | Matrix calculus, softmax Jacobian, chain rule | Deep understanding of Transformer training | AI Labs |
Probability & Information Theory
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 18 | Derive the ELBO (Evidence Lower Bound) for VAEs | Hard | 30 min | Jensen's inequality, KL divergence decomposition | Foundation of variational inference | DeepMind, OpenAI, Meta |
| 19 | Prove That KL Divergence Is Non-Negative | Medium | 20 min | Jensen's inequality, convexity of -log | Basic information theory; tests mathematical rigor | All AI Labs |
| 20 | Calculate the Entropy of a Mixture of Gaussians | Medium | 25 min | Mixture model entropy bounds, Monte Carlo estimation | Mixture models appear throughout ML | DeepMind, Google |
| 21 | Derive the Bias-Variance Decomposition for MSE | Medium | 20 min | Expectation algebra, law of total expectation | Foundational ML theory | All |
| 22 | Explain Why Dropout Works as Approximate Bayesian Inference | Hard | 30 min | Monte Carlo dropout, model uncertainty | Connects practical technique to theory | DeepMind, Google |
Analysis & Convergence
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 23 | Prove Convergence of SGD Under Convexity Assumptions | Hard | 35 min | Learning rate schedule, expected loss decrease | Optimization theory for deep learning | DeepMind, Google |
| 24 | Analyze the Computational Complexity of Transformer Self-Attention | Medium | 20 min | O(n^2*d) time and space, alternatives | Understanding scaling is critical for LLM research | All AI Labs |
Section 3: Research-Flavored Coding (10 Problems)
These problems test strong CS fundamentals with a research twist -- the kind of algorithmic thinking needed to make research ideas work in practice.
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 25 | Implement Efficient Top-K Selection Without Full Sort | Medium | 20 min | Quickselect, partial sort | Used in beam search, top-K sampling, retrieval | All |
| 26 | Implement a Bloom Filter for Deduplication | Medium | 25 min | Probabilistic data structure, false positive analysis | Used in training data deduplication, web crawling | Google, Meta |
| 27 | Implement A Search for Shortest Path* | Medium | 30 min | Heuristic search, priority queue | Planning in agents, graph-based reasoning | DeepMind, Google |
| 28 | Implement Sparse Matrix Multiplication | Hard | 35 min | CSR/CSC format, efficient iteration | Sparse operations are critical for large-scale models | Google, DeepMind |
| 29 | Implement the Hungarian Algorithm for Optimal Assignment | Hard | 40 min | Bipartite matching, augmenting paths | Used in DETR (object detection), evaluation metrics | DeepMind, Meta |
| 30 | Implement Dijkstra's Algorithm with Decrease-Key | Medium | 25 min | Priority queue with updates | Graph reasoning, shortest path problems | All |
| 31 | Implement a KD-Tree for Nearest Neighbor Search | Hard | 35 min | Space partitioning, recursive construction | Efficient search in embedding spaces | Google, DeepMind |
| 32 | Implement Online Learning (Perceptron with Mistake Bound) | Medium | 25 min | Online update, mistake-driven learning | Foundation of online/streaming ML | DeepMind, Google |
| 33 | Implement Parallel Prefix Sum (Scan) | Medium | 25 min | Work-efficient parallel algorithm | Foundation of GPU programming, parallel reductions | AI Labs |
| 34 | Implement Consistent Hashing for Distributed Data | Medium | 25 min | Hash ring, virtual nodes | Distributed training data partitioning | Google, Meta |
Section 4: Paper Discussion & Research Taste (11 Problems)
These problems test your ability to read, critique, and extend research -- the hallmark of a strong research engineer.
Paper Analysis
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 35 | Critique the Experimental Setup of "Attention Is All You Need" | Medium | 25 min | Ablation design, baseline selection, evaluation metrics | The most important modern ML paper | All AI Labs |
| 36 | Compare BERT vs. GPT Pre-Training Approaches: Strengths and Weaknesses | Medium | 20 min | Masked LM vs. autoregressive, bidirectional vs. unidirectional | Foundation of modern NLP | All |
| 37 | Explain Why Scaling Laws Matter for LLM Development | Medium | 25 min | Chinchilla scaling, compute-optimal training | Drives billion-dollar resource allocation decisions | OpenAI, Anthropic, DeepMind |
| 38 | Analyze the RLHF Pipeline: What Can Go Wrong? | Hard | 30 min | Reward hacking, distribution shift, reward model limitations | Core technique for aligning LLMs | Anthropic, OpenAI, DeepMind |
| 39 | Explain Chain-of-Thought Prompting: Why Does It Work? | Medium | 20 min | Implicit computation, reasoning traces | Emergent ability with practical implications | All |
Research Direction Questions
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 40 | What Are the Most Important Open Problems in AI Safety? | Medium | 25 min | Alignment, interpretability, robustness | Safety research is a top priority at AI labs | Anthropic, OpenAI, DeepMind |
| 41 | Propose an Approach to Make Transformers Handle Long Contexts Efficiently | Hard | 30 min | Efficient attention, memory mechanisms, retrieval augmentation | Active area of research with direct practical impact | All AI Labs |
| 42 | How Would You Evaluate Whether an LLM Truly "Understands" Language? | Hard | 30 min | Behavioral tests, probing, mechanistic interpretability | Philosophical but practically relevant | Anthropic, DeepMind |
| 43 | Design an Experiment to Test Whether Larger Models Are More Calibrated | Medium | 25 min | Experimental design, calibration metrics, controlled comparisons | Tests ability to design rigorous experiments | All AI Labs |
| 44 | What Research Would You Pursue Given Unlimited Compute for 6 Months? | Medium | 20 min | Research vision, feasibility assessment, impact estimation | Tests research taste and ambition | All AI Labs |
| 45 | Critique a Recent Paper of Your Choice and Propose an Extension | Hard | 30 min | Critical reading, identifying limitations, creative extension | The ultimate research engineer test | All AI Labs |
:::note Research Taste Questions There are no "right" answers to research taste questions. Interviewers are looking for:
- Awareness of the current research landscape
- Critical thinking about what matters and what doesn't
- Originality in proposing new directions
- Feasibility assessment -- wild ideas are fine if you acknowledge the challenges
- Depth on at least one area you care about deeply :::
6-Week Research Engineer Study Plan
Research Engineer preparation takes longer due to the mathematical depth required.
| Week | Focus | Problems | Daily Load |
|---|---|---|---|
| Week 1 | Core implementations | #1-6 | 1 implementation/day |
| Week 2 | Advanced implementations | #7-12 | 1 implementation/day |
| Week 3 | Mathematics | #13-24 | 2 proofs/derivations per day |
| Week 4 | Research coding | #25-34 | 2 problems/day |
| Week 5 | Paper discussion | #35-45 | 2 problems/day + read papers |
| Week 6 | Integration + mock | Mixed | 1 deep problem + 1 mock/day |
Daily Practice Format for Research Engineers
:::tip Building Research Taste Research taste is built over months, not days. Start reading papers now, even if you are months from interviews:
- Subscribe to arXiv daily digests for your subfield
- Follow AI researchers on Twitter/X for commentary
- Attend online reading groups
- Write brief summaries of papers you read :::
Essential Math Reference
Linear Algebra Core
| Concept | Where It Appears | Must Know |
|---|---|---|
| Matrix multiplication | Attention, linear layers | Dimensions, complexity |
| Eigendecomposition | PCA, spectral methods | Eigenvalues, eigenvectors |
| SVD | Compression, LoRA | Truncated SVD, rank |
| Matrix calculus | Backpropagation | Jacobian, chain rule |
| Positive definiteness | Kernel methods, covariance | Cholesky, eigenvalue test |
Probability & Statistics Core
| Concept | Where It Appears | Must Know |
|---|---|---|
| Bayes' theorem | Bayesian inference, posteriors | Prior, likelihood, posterior |
| KL divergence | VAEs, RLHF, distillation | Properties, computation |
| Entropy | Information theory, cross-entropy loss | Bits, nats, relationship to loss |
| Gaussian distribution | Everything | PDF, MLE, conjugate prior |
| Law of large numbers | SGD convergence | Weak vs. strong |
Optimization Core
| Concept | Where It Appears | Must Know |
|---|---|---|
| Gradient descent | All training | Learning rate, convergence |
| Convexity | Loss landscape analysis | Convex functions, local vs. global |
| Lagrange multipliers | Constrained optimization, SVM | KKT conditions |
| Stochastic optimization | SGD, Adam | Variance, bias, convergence |
| Second-order methods | L-BFGS, natural gradient | Hessian, Fisher information |
Problem Deep Dive: Implement Multi-Head Self-Attention
This is the single most important implementation problem for research roles. Here is how to approach it:
The Math
Attention(Q, K, V) = softmax(Q @ K^T / sqrt(d_k)) @ V
Multi-Head:
For each head i:
Q_i = X @ W_Q_i (project to head dimension)
K_i = X @ W_K_i
V_i = X @ W_V_i
head_i = Attention(Q_i, K_i, V_i)
Output = Concat(head_1, ..., head_h) @ W_O
Implementation Skeleton (NumPy)
def multi_head_attention(X, W_Q, W_K, W_V, W_O, n_heads):
batch, seq_len, d_model = X.shape
d_k = d_model // n_heads
# Project to Q, K, V
Q = X @ W_Q # (batch, seq_len, d_model)
K = X @ W_K
V = X @ W_V
# Reshape for multi-head: (batch, n_heads, seq_len, d_k)
Q = Q.reshape(batch, seq_len, n_heads, d_k).transpose(0, 2, 1, 3)
K = K.reshape(batch, seq_len, n_heads, d_k).transpose(0, 2, 1, 3)
V = V.reshape(batch, seq_len, n_heads, d_k).transpose(0, 2, 1, 3)
# Scaled dot-product attention
scores = Q @ K.transpose(0, 1, 3, 2) / np.sqrt(d_k) # (batch, heads, seq, seq)
weights = softmax(scores, axis=-1)
context = weights @ V # (batch, heads, seq, d_k)
# Concatenate heads
context = context.transpose(0, 2, 1, 3).reshape(batch, seq_len, d_model)
# Output projection
output = context @ W_O
return output
What Interviewers Check
- Correct scaling by sqrt(d_k) -- prevents softmax saturation
- Correct reshape and transpose for multi-head splitting
- Correct concatenation order after attention
- Numerically stable softmax (subtract max before exp)
- Discussion of masking for decoder (causal mask)
- Complexity analysis: O(n^2 * d) time and space
Difficulty Distribution
| Difficulty | Problems | Count |
|---|---|---|
| Easy | (none) | 0 |
| Medium | #2, #3, #8, #11, #13, #15, #16, #19, #20, #21, #24, #25, #26, #27, #30, #32, #33, #34, #35, #36, #37, #39, #40, #43, #44 | 25 |
| Hard | #1, #4, #5, #6, #7, #9, #10, #12, #14, #17, #18, #22, #23, #28, #29, #31, #38, #41, #42, #45 | 20 |
:::danger Research Engineer Problems Are Hard Notice: there are zero Easy problems. Research Engineer interviews are the hardest in AI/ML. If you are not comfortable with Medium-difficulty problems, build a stronger foundation with the Core 50 and Medium Tier first. :::
Progress Tracker
| # | Problem | Status | Date | Time | Notes |
|---|---|---|---|---|---|
| 1 | Multi-Head Self-Attention | [ ] | |||
| 2 | BPE Tokenizer | [ ] | |||
| 3 | Beam Search | [ ] | |||
| 4 | GAN Training Loop | [ ] | |||
| 5 | SimCLR Contrastive Learning | [ ] | |||
| 6 | REINFORCE Policy Gradient | [ ] | |||
| 7 | Flash Attention (Simplified) | [ ] | |||
| 8 | LoRA Implementation | [ ] | |||
| 9 | Rotary Position Embeddings | [ ] | |||
| 10 | VAE with Reparameterization | [ ] | |||
| 11 | Group Query Attention | [ ] | |||
| 12 | DDPM Noise Schedule | [ ] | |||
| 13 | Softmax CE Gradient | [ ] | |||
| 14 | SVD Low-Rank Proof | [ ] | |||
| 15 | Adam Optimizer Derivation | [ ] | |||
| 16 | Reparameterization Trick | [ ] | |||
| 17 | Attention Gradient | [ ] | |||
| 18 | ELBO Derivation | [ ] | |||
| 19 | KL Non-Negativity Proof | [ ] | |||
| 20 | Mixture of Gaussians Entropy | [ ] | |||
| 21 | Bias-Variance Decomposition | [ ] | |||
| 22 | Dropout as Bayesian Inference | [ ] | |||
| 23 | SGD Convergence Proof | [ ] | |||
| 24 | Transformer Complexity Analysis | [ ] | |||
| 25 | Efficient Top-K Selection | [ ] | |||
| 26 | Bloom Filter | [ ] | |||
| 27 | A* Search | [ ] | |||
| 28 | Sparse Matrix Multiplication | [ ] | |||
| 29 | Hungarian Algorithm | [ ] | |||
| 30 | Dijkstra with Decrease-Key | [ ] | |||
| 31 | KD-Tree | [ ] | |||
| 32 | Online Perceptron | [ ] | |||
| 33 | Parallel Prefix Sum | [ ] | |||
| 34 | Consistent Hashing | [ ] | |||
| 35 | Critique "Attention Is All You Need" | [ ] | |||
| 36 | BERT vs GPT Comparison | [ ] | |||
| 37 | Scaling Laws | [ ] | |||
| 38 | RLHF Pipeline Analysis | [ ] | |||
| 39 | Chain-of-Thought Analysis | [ ] | |||
| 40 | AI Safety Open Problems | [ ] | |||
| 41 | Long-Context Transformers | [ ] | |||
| 42 | LLM Understanding Evaluation | [ ] | |||
| 43 | Calibration Experiment Design | [ ] | |||
| 44 | Research Vision Question | [ ] | |||
| 45 | Paper Critique & Extension | [ ] |
Next Steps
After completing the Research Engineer problem list:
- Hard Tier for more challenging algorithmic problems
- Google-Style Problems since DeepMind/Google Brain are top research destinations
- Section 9: Paper Discussion for deeper paper analysis practice
- Section 15: Role-Specific Prep for the full Research Engineer preparation path
