Skip to main content

Research Engineer Problem List

Reading time: ~40 min | Interview relevance: Critical | Roles: Research Engineer, Research Scientist, Applied Research Scientist, ML Research Engineer

A Research Engineer at a top AI lab opens your interview with: "Here is a paper we published last month. Read the abstract and Section 3. You have 45 minutes to implement the core algorithm." No LeetCode. No system design templates. Just you, a whiteboard (or laptop), and the mathematical heart of a new method. If that sounds exciting rather than terrifying, this role is for you.

Research Engineer interviews are the most technically demanding in AI/ML. They test deep mathematical understanding, algorithm implementation from papers, and the ability to critically evaluate research. This list of 45 problems prepares you for all four dimensions: paper implementation, mathematical reasoning, algorithm coding, and research taste.

Research Engineer Interview Structure

RoundDurationWhat They TestWeight
Algorithm Implementation60-90 minCode an algorithm from a paper or description30-35%
Math & Theory45-60 minProbability, linear algebra, optimization, information theory20-25%
Paper Discussion45-60 minCritically analyze a paper, propose improvements20-25%
Coding45-60 minStrong CS fundamentals, DSA15-20%
Research Taste30-45 minWhat problems matter? Where is the field going?5-10%

:::tip The Research Engineer Bar Research Engineers are expected to bridge the gap between mathematical ideas and working code. You must be comfortable reading equations, understanding the intuition behind them, and implementing them efficiently. This is a rare combination. :::

Section 1: Paper Implementation (12 Problems)

These problems simulate the most distinctive part of Research Engineer interviews: implementing algorithms from paper descriptions.

Core Algorithm Implementations

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
1Implement Multi-Head Self-AttentionHard40 minScaled dot-product attention, head splitting, concatenationThe foundation of Transformers; must be second natureDeepMind, Google, OpenAI, Anthropic
2Implement Byte-Pair Encoding (BPE) TokenizerMedium30 minIterative merge of frequent pairsCore NLP preprocessing; appears in every LLM paperOpenAI, Google, Meta
3Implement Beam Search DecodingMedium30 minBreadth-limited search, log probability accumulationStandard decoding strategy for sequence modelsGoogle, Meta, AI Labs
4Implement a Simple GAN Training LoopHard40 minAlternating optimization, generator/discriminator interplayTests understanding of adversarial training dynamicsDeepMind, OpenAI, Meta
5Implement Contrastive Learning (SimCLR-style)Hard40 minData augmentation, projection head, NT-Xent lossSelf-supervised learning is a major research directionGoogle, Meta, DeepMind
6Implement REINFORCE Policy GradientHard35 minLog probability trick, baseline subtraction, variance reductionFoundation of RLHF and RL researchDeepMind, OpenAI, Anthropic

Advanced Implementations

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
7Implement Flash Attention (Simplified)Hard45 minTiled computation, memory-efficient attentionCritical optimization for large-scale TransformersAI Labs
8Implement LoRA (Low-Rank Adaptation)Medium30 minLow-rank decomposition of weight updatesStandard parameter-efficient fine-tuningAI Labs, Big Tech
9Implement Rotary Position Embeddings (RoPE)Hard35 minRotation matrices, relative position encodingUsed in most modern LLMsAI Labs
10Implement a VAE (Variational Autoencoder)Hard40 minReparameterization trick, ELBO, KL divergenceCore generative model; tests probabilistic ML depthDeepMind, OpenAI, Meta
11Implement Group Query Attention (GQA)Medium25 minKey-value head sharing, memory reductionEfficiency technique in modern TransformersGoogle, Meta
12Implement DDPM Noise Schedule and Forward ProcessHard35 minGaussian noise addition, noise schedule, variance scheduleDiffusion models are a major research areaGoogle, OpenAI, Stability AI

:::warning Paper Implementation Tips

  • Read the math first, then the code. Do not jump to implementation.
  • Identify the core computation (usually 3-5 lines of math).
  • Implement a naive version first, then optimize.
  • Always verify dimensions with small examples.
  • Comment your code with equation references. :::

Section 2: Mathematical Reasoning (12 Problems)

These problems test your ability to reason mathematically about ML concepts -- a core requirement for research roles.

Linear Algebra & Optimization

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
13Derive the Gradient of Softmax Cross-Entropy LossMedium25 minChain rule, Jacobian of softmaxEvery neural network uses this; derivation tests understandingAll AI Labs
14Prove That SVD Gives the Best Low-Rank ApproximationHard30 minEckart-Young theorem, Frobenius norm minimizationFoundation of dimensionality reduction and compressionDeepMind, Google
15Derive the Update Rules for Adam OptimizerMedium25 minExponential moving averages, bias correctionMost common optimizer; understanding internals mattersAll
16Explain and Derive the Reparameterization TrickMedium25 minPathwise gradient estimation, sampling from distributionsCritical for VAEs and stochastic computation graphsDeepMind, OpenAI, Meta
17Derive the Gradient of Attention with Respect to QueriesHard30 minMatrix calculus, softmax Jacobian, chain ruleDeep understanding of Transformer trainingAI Labs

Probability & Information Theory

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
18Derive the ELBO (Evidence Lower Bound) for VAEsHard30 minJensen's inequality, KL divergence decompositionFoundation of variational inferenceDeepMind, OpenAI, Meta
19Prove That KL Divergence Is Non-NegativeMedium20 minJensen's inequality, convexity of -logBasic information theory; tests mathematical rigorAll AI Labs
20Calculate the Entropy of a Mixture of GaussiansMedium25 minMixture model entropy bounds, Monte Carlo estimationMixture models appear throughout MLDeepMind, Google
21Derive the Bias-Variance Decomposition for MSEMedium20 minExpectation algebra, law of total expectationFoundational ML theoryAll
22Explain Why Dropout Works as Approximate Bayesian InferenceHard30 minMonte Carlo dropout, model uncertaintyConnects practical technique to theoryDeepMind, Google

Analysis & Convergence

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
23Prove Convergence of SGD Under Convexity AssumptionsHard35 minLearning rate schedule, expected loss decreaseOptimization theory for deep learningDeepMind, Google
24Analyze the Computational Complexity of Transformer Self-AttentionMedium20 minO(n^2*d) time and space, alternativesUnderstanding scaling is critical for LLM researchAll AI Labs

Section 3: Research-Flavored Coding (10 Problems)

These problems test strong CS fundamentals with a research twist -- the kind of algorithmic thinking needed to make research ideas work in practice.

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
25Implement Efficient Top-K Selection Without Full SortMedium20 minQuickselect, partial sortUsed in beam search, top-K sampling, retrievalAll
26Implement a Bloom Filter for DeduplicationMedium25 minProbabilistic data structure, false positive analysisUsed in training data deduplication, web crawlingGoogle, Meta
27Implement A Search for Shortest Path*Medium30 minHeuristic search, priority queuePlanning in agents, graph-based reasoningDeepMind, Google
28Implement Sparse Matrix MultiplicationHard35 minCSR/CSC format, efficient iterationSparse operations are critical for large-scale modelsGoogle, DeepMind
29Implement the Hungarian Algorithm for Optimal AssignmentHard40 minBipartite matching, augmenting pathsUsed in DETR (object detection), evaluation metricsDeepMind, Meta
30Implement Dijkstra's Algorithm with Decrease-KeyMedium25 minPriority queue with updatesGraph reasoning, shortest path problemsAll
31Implement a KD-Tree for Nearest Neighbor SearchHard35 minSpace partitioning, recursive constructionEfficient search in embedding spacesGoogle, DeepMind
32Implement Online Learning (Perceptron with Mistake Bound)Medium25 minOnline update, mistake-driven learningFoundation of online/streaming MLDeepMind, Google
33Implement Parallel Prefix Sum (Scan)Medium25 minWork-efficient parallel algorithmFoundation of GPU programming, parallel reductionsAI Labs
34Implement Consistent Hashing for Distributed DataMedium25 minHash ring, virtual nodesDistributed training data partitioningGoogle, Meta

Section 4: Paper Discussion & Research Taste (11 Problems)

These problems test your ability to read, critique, and extend research -- the hallmark of a strong research engineer.

Paper Analysis

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
35Critique the Experimental Setup of "Attention Is All You Need"Medium25 minAblation design, baseline selection, evaluation metricsThe most important modern ML paperAll AI Labs
36Compare BERT vs. GPT Pre-Training Approaches: Strengths and WeaknessesMedium20 minMasked LM vs. autoregressive, bidirectional vs. unidirectionalFoundation of modern NLPAll
37Explain Why Scaling Laws Matter for LLM DevelopmentMedium25 minChinchilla scaling, compute-optimal trainingDrives billion-dollar resource allocation decisionsOpenAI, Anthropic, DeepMind
38Analyze the RLHF Pipeline: What Can Go Wrong?Hard30 minReward hacking, distribution shift, reward model limitationsCore technique for aligning LLMsAnthropic, OpenAI, DeepMind
39Explain Chain-of-Thought Prompting: Why Does It Work?Medium20 minImplicit computation, reasoning tracesEmergent ability with practical implicationsAll

Research Direction Questions

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
40What Are the Most Important Open Problems in AI Safety?Medium25 minAlignment, interpretability, robustnessSafety research is a top priority at AI labsAnthropic, OpenAI, DeepMind
41Propose an Approach to Make Transformers Handle Long Contexts EfficientlyHard30 minEfficient attention, memory mechanisms, retrieval augmentationActive area of research with direct practical impactAll AI Labs
42How Would You Evaluate Whether an LLM Truly "Understands" Language?Hard30 minBehavioral tests, probing, mechanistic interpretabilityPhilosophical but practically relevantAnthropic, DeepMind
43Design an Experiment to Test Whether Larger Models Are More CalibratedMedium25 minExperimental design, calibration metrics, controlled comparisonsTests ability to design rigorous experimentsAll AI Labs
44What Research Would You Pursue Given Unlimited Compute for 6 Months?Medium20 minResearch vision, feasibility assessment, impact estimationTests research taste and ambitionAll AI Labs
45Critique a Recent Paper of Your Choice and Propose an ExtensionHard30 minCritical reading, identifying limitations, creative extensionThe ultimate research engineer testAll AI Labs

:::note Research Taste Questions There are no "right" answers to research taste questions. Interviewers are looking for:

  • Awareness of the current research landscape
  • Critical thinking about what matters and what doesn't
  • Originality in proposing new directions
  • Feasibility assessment -- wild ideas are fine if you acknowledge the challenges
  • Depth on at least one area you care about deeply :::

6-Week Research Engineer Study Plan

Research Engineer preparation takes longer due to the mathematical depth required.

WeekFocusProblemsDaily Load
Week 1Core implementations#1-61 implementation/day
Week 2Advanced implementations#7-121 implementation/day
Week 3Mathematics#13-242 proofs/derivations per day
Week 4Research coding#25-342 problems/day
Week 5Paper discussion#35-452 problems/day + read papers
Week 6Integration + mockMixed1 deep problem + 1 mock/day

Daily Practice Format for Research Engineers

Research Engineer Daily Practice Format - Morning, Afternoon, Evening Breakdown

:::tip Building Research Taste Research taste is built over months, not days. Start reading papers now, even if you are months from interviews:

  • Subscribe to arXiv daily digests for your subfield
  • Follow AI researchers on Twitter/X for commentary
  • Attend online reading groups
  • Write brief summaries of papers you read :::

Essential Math Reference

Linear Algebra Core

ConceptWhere It AppearsMust Know
Matrix multiplicationAttention, linear layersDimensions, complexity
EigendecompositionPCA, spectral methodsEigenvalues, eigenvectors
SVDCompression, LoRATruncated SVD, rank
Matrix calculusBackpropagationJacobian, chain rule
Positive definitenessKernel methods, covarianceCholesky, eigenvalue test

Probability & Statistics Core

ConceptWhere It AppearsMust Know
Bayes' theoremBayesian inference, posteriorsPrior, likelihood, posterior
KL divergenceVAEs, RLHF, distillationProperties, computation
EntropyInformation theory, cross-entropy lossBits, nats, relationship to loss
Gaussian distributionEverythingPDF, MLE, conjugate prior
Law of large numbersSGD convergenceWeak vs. strong

Optimization Core

ConceptWhere It AppearsMust Know
Gradient descentAll trainingLearning rate, convergence
ConvexityLoss landscape analysisConvex functions, local vs. global
Lagrange multipliersConstrained optimization, SVMKKT conditions
Stochastic optimizationSGD, AdamVariance, bias, convergence
Second-order methodsL-BFGS, natural gradientHessian, Fisher information

Problem Deep Dive: Implement Multi-Head Self-Attention

This is the single most important implementation problem for research roles. Here is how to approach it:

The Math

Attention(Q, K, V) = softmax(Q @ K^T / sqrt(d_k)) @ V

Multi-Head:
For each head i:
Q_i = X @ W_Q_i (project to head dimension)
K_i = X @ W_K_i
V_i = X @ W_V_i
head_i = Attention(Q_i, K_i, V_i)

Output = Concat(head_1, ..., head_h) @ W_O

Implementation Skeleton (NumPy)

def multi_head_attention(X, W_Q, W_K, W_V, W_O, n_heads):
batch, seq_len, d_model = X.shape
d_k = d_model // n_heads

# Project to Q, K, V
Q = X @ W_Q # (batch, seq_len, d_model)
K = X @ W_K
V = X @ W_V

# Reshape for multi-head: (batch, n_heads, seq_len, d_k)
Q = Q.reshape(batch, seq_len, n_heads, d_k).transpose(0, 2, 1, 3)
K = K.reshape(batch, seq_len, n_heads, d_k).transpose(0, 2, 1, 3)
V = V.reshape(batch, seq_len, n_heads, d_k).transpose(0, 2, 1, 3)

# Scaled dot-product attention
scores = Q @ K.transpose(0, 1, 3, 2) / np.sqrt(d_k) # (batch, heads, seq, seq)
weights = softmax(scores, axis=-1)
context = weights @ V # (batch, heads, seq, d_k)

# Concatenate heads
context = context.transpose(0, 2, 1, 3).reshape(batch, seq_len, d_model)

# Output projection
output = context @ W_O
return output

What Interviewers Check

  1. Correct scaling by sqrt(d_k) -- prevents softmax saturation
  2. Correct reshape and transpose for multi-head splitting
  3. Correct concatenation order after attention
  4. Numerically stable softmax (subtract max before exp)
  5. Discussion of masking for decoder (causal mask)
  6. Complexity analysis: O(n^2 * d) time and space

Difficulty Distribution

DifficultyProblemsCount
Easy(none)0
Medium#2, #3, #8, #11, #13, #15, #16, #19, #20, #21, #24, #25, #26, #27, #30, #32, #33, #34, #35, #36, #37, #39, #40, #43, #4425
Hard#1, #4, #5, #6, #7, #9, #10, #12, #14, #17, #18, #22, #23, #28, #29, #31, #38, #41, #42, #4520

:::danger Research Engineer Problems Are Hard Notice: there are zero Easy problems. Research Engineer interviews are the hardest in AI/ML. If you are not comfortable with Medium-difficulty problems, build a stronger foundation with the Core 50 and Medium Tier first. :::

Progress Tracker

#ProblemStatusDateTimeNotes
1Multi-Head Self-Attention[ ]
2BPE Tokenizer[ ]
3Beam Search[ ]
4GAN Training Loop[ ]
5SimCLR Contrastive Learning[ ]
6REINFORCE Policy Gradient[ ]
7Flash Attention (Simplified)[ ]
8LoRA Implementation[ ]
9Rotary Position Embeddings[ ]
10VAE with Reparameterization[ ]
11Group Query Attention[ ]
12DDPM Noise Schedule[ ]
13Softmax CE Gradient[ ]
14SVD Low-Rank Proof[ ]
15Adam Optimizer Derivation[ ]
16Reparameterization Trick[ ]
17Attention Gradient[ ]
18ELBO Derivation[ ]
19KL Non-Negativity Proof[ ]
20Mixture of Gaussians Entropy[ ]
21Bias-Variance Decomposition[ ]
22Dropout as Bayesian Inference[ ]
23SGD Convergence Proof[ ]
24Transformer Complexity Analysis[ ]
25Efficient Top-K Selection[ ]
26Bloom Filter[ ]
27A* Search[ ]
28Sparse Matrix Multiplication[ ]
29Hungarian Algorithm[ ]
30Dijkstra with Decrease-Key[ ]
31KD-Tree[ ]
32Online Perceptron[ ]
33Parallel Prefix Sum[ ]
34Consistent Hashing[ ]
35Critique "Attention Is All You Need"[ ]
36BERT vs GPT Comparison[ ]
37Scaling Laws[ ]
38RLHF Pipeline Analysis[ ]
39Chain-of-Thought Analysis[ ]
40AI Safety Open Problems[ ]
41Long-Context Transformers[ ]
42LLM Understanding Evaluation[ ]
43Calibration Experiment Design[ ]
44Research Vision Question[ ]
45Paper Critique & Extension[ ]

Next Steps

After completing the Research Engineer problem list:

© 2026 EngineersOfAI. All rights reserved.