Paper Discussion Round - Show Your Research Taste

Reading time: ~16 min | Interview relevance: Critical (RE), High (MLE) | Roles: RE, some MLE

The Real Interview Moment

"Tell me about a paper you've read recently that you found interesting."

You pick a paper. You spend 10 minutes summarizing every section. The interviewer interrupts: "What's the key limitation of this approach?" You stammer - you focused on understanding the paper, not critiquing it. The interviewer follows up: "If you had unlimited compute and data, what experiment would you run to improve on this?" You don't have an answer.

The paper discussion round tests research taste, not reading speed. Can you identify what matters, what doesn't, and what comes next?

What You Will Master

How to structure a paper presentation in 8-10 minutes
What interviewers are actually evaluating (it's not summarization)
How to critique papers constructively
How to handle follow-up questions you haven't prepared for
The 10 papers you should be ready to discuss

Part 1 - The Presentation Framework

The 5-Part Structure (8-10 minutes)

Paper Discussion - 5-Part Presentation Structure

Problem (1 min): What problem does this paper solve? Why does it matter? What existed before?
Core Contribution (2-3 min): What's the key technical idea? Explain the method clearly. Use a diagram if helpful.
Key Experiments (2 min): What are the main results? Are they convincing? What's the most important table/figure?
Limitations (1-2 min): What doesn't this paper address? Where might it fail? What assumptions does it make?
Extensions (1-2 min): What would you do next? What experiment would test the limitations? How could this be improved?

Interviewer's Perspective

The single strongest signal in a paper discussion is unprompted limitations analysis. When a candidate says "This paper is great, but here are three things it doesn't address..." - that tells me they have real research taste. Summarization shows reading ability. Critique shows thinking ability.

Part 2 - What Gets Scored

Criterion	No Hire	Lean Hire	Strong Hire
Clarity	Can't explain the paper clearly	Clear summary but reads like a textbook	Explains with intuition, adapts to audience
Depth	Surface-level understanding	Understands the method	Can discuss implementation details and math
Critique	"It's a great paper" (no critique)	Identifies 1 limitation	3+ thoughtful limitations with proposed experiments
Extension	No ideas for follow-up	Generic ideas ("more data")	Specific, feasible, novel extensions
Q&A	Can't handle probing questions	Answers with some depth	Reasons from first principles when unsure

Part 3 - The 10 Papers You Should Know

Papers That Come Up Most Often

Paper	Year	Why It's Asked	Key Concepts
Attention Is All You Need	2017	Foundation of modern AI	Self-attention, multi-head attention, positional encoding
BERT	2018	Pre-training + fine-tuning paradigm	Masked language modeling, next sentence prediction
GPT-3	2020	In-context learning, scaling laws	Few-shot prompting, emergent abilities
LoRA	2021	Efficient fine-tuning	Low-rank adaptation, parameter efficiency
InstructGPT / RLHF	2022	Alignment, human feedback	Reward modeling, PPO, preference learning
RAG	2020	Retrieval-augmented generation	Combining retrieval with generation
FlashAttention	2022	Efficient attention	IO-aware, tiling, SRAM utilization
DPO	2023	Simpler alignment	Direct preference optimization vs. RLHF
Mixture of Experts	Various	Efficient scaling	Sparse activation, routing, expert specialization
Vision Transformers (ViT)	2020	Transformers for vision	Patch embedding, position encoding for images

How Deep Should You Go?

For your chosen paper (the one you'll present): Read every section, understand the math, reimplement if possible.

For other papers: Understand the key contribution, 1-2 limitations, and how it relates to other work. You don't need to know the math.

Part 4 - Handling Follow-Up Questions

Common Follow-Up Patterns

Question Type	Example	How to Handle
"Go deeper on X"	"Explain the attention computation step by step"	Walk through the math: Q, K, V matrices, dot product, softmax, output
"What if you changed X?"	"What if you used cosine similarity instead of dot product?"	Reason from first principles. "Cosine normalizes for magnitude, so..."
"Why not use Y instead?"	"Why not use RNNs for this task?"	Compare trade-offs. "RNNs can't parallelize, which limits training speed..."
"How would you improve this?"	"How would you make this work for longer sequences?"	Propose specific ideas: sparse attention, chunking, etc.
"What's the impact?"	"Has this paper actually changed practice?"	Connect to downstream impact: "Every major LLM uses this..."

When You Don't Know the Answer

Script: "I haven't thought about that specific angle. Let me reason through it... [think out loud]. If we changed X, I'd expect Y because of Z. But I'm not certain - this would be a great experiment to run."

Practice Problems

Problem 1: Paper Critique

Present the "Attention Is All You Need" paper in 8 minutes. Then answer: "What are the three biggest limitations of the original transformer architecture?"

Full Answer + Rubric

Key limitations:

Quadratic attention complexity: O(N²) in sequence length makes it prohibitive for long sequences. Led to FlashAttention, sparse attention, linear attention variants.
Fixed context window: The model can only attend to a fixed number of tokens. Led to techniques like RoPE, ALiBi, and context extension methods.
No explicit recurrence or memory: Each forward pass is independent - no way to carry state across sequences without external memory. Led to work on retrieval-augmented models and memory-augmented transformers.

Bonus: Position encoding was arbitrary (sinusoidal). Learned position embeddings, RoPE, and ALiBi all addressed this.

Scoring:

Strong Hire: 3+ specific limitations with references to follow-up work
Lean Hire: 1-2 correct limitations
No Hire: Can't identify any limitation

Interview Cheat Sheet

Phase	What to Say	Time
Opening	"I'd like to discuss [paper name], published in [year] by [lab]. It addresses the problem of..."	0-1 min
Core idea	"The key contribution is [specific technique], which works by..."	1-4 min
Results	"The main result shows [X]% improvement on [benchmark]. The most convincing experiment is..."	4-6 min
Critique	"However, there are limitations: [1], [2], [3]"	6-8 min
Extension	"If I were continuing this work, I'd explore..."	8-10 min

Spaced Repetition Checkpoints

Day 0: Choose your presentation paper. Read it thoroughly.
Day 3: Present the paper to a friend in 10 minutes. Get feedback.
Day 7: Read 2 more papers from the must-know list. Write 3 limitations for each.
Day 14: Have someone quiz you with follow-up questions on your chosen paper.
Day 21: Present a different paper you've never discussed before. Can you do it well with less prep?

What's Next

For full paper discussion prep → Paper Discussion
Behavioral Round - The soft skills round
Take-Home Assessment - Practical project evaluation

The Real Interview Moment​

What You Will Master​

Part 1 - The Presentation Framework​

The 5-Part Structure (8-10 minutes)​

Part 2 - What Gets Scored​

Part 3 - The 10 Papers You Should Know​

Papers That Come Up Most Often​

How Deep Should You Go?​

Part 4 - Handling Follow-Up Questions​

Common Follow-Up Patterns​

When You Don't Know the Answer​

Practice Problems​

Problem 1: Paper Critique​

Interview Cheat Sheet​

Spaced Repetition Checkpoints​

What's Next​