Skip to main content

Paper Discussion Round - Show Your Research Taste

Reading time: ~16 min | Interview relevance: Critical (RE), High (MLE) | Roles: RE, some MLE

The Real Interview Moment

"Tell me about a paper you've read recently that you found interesting."

You pick a paper. You spend 10 minutes summarizing every section. The interviewer interrupts: "What's the key limitation of this approach?" You stammer - you focused on understanding the paper, not critiquing it. The interviewer follows up: "If you had unlimited compute and data, what experiment would you run to improve on this?" You don't have an answer.

The paper discussion round tests research taste, not reading speed. Can you identify what matters, what doesn't, and what comes next?

What You Will Master

  • How to structure a paper presentation in 8-10 minutes
  • What interviewers are actually evaluating (it's not summarization)
  • How to critique papers constructively
  • How to handle follow-up questions you haven't prepared for
  • The 10 papers you should be ready to discuss

Part 1 - The Presentation Framework

The 5-Part Structure (8-10 minutes)

Paper Discussion - 5-Part Presentation Structure

  1. Problem (1 min): What problem does this paper solve? Why does it matter? What existed before?
  2. Core Contribution (2-3 min): What's the key technical idea? Explain the method clearly. Use a diagram if helpful.
  3. Key Experiments (2 min): What are the main results? Are they convincing? What's the most important table/figure?
  4. Limitations (1-2 min): What doesn't this paper address? Where might it fail? What assumptions does it make?
  5. Extensions (1-2 min): What would you do next? What experiment would test the limitations? How could this be improved?
Interviewer's Perspective

The single strongest signal in a paper discussion is unprompted limitations analysis. When a candidate says "This paper is great, but here are three things it doesn't address..." - that tells me they have real research taste. Summarization shows reading ability. Critique shows thinking ability.

Part 2 - What Gets Scored

CriterionNo HireLean HireStrong Hire
ClarityCan't explain the paper clearlyClear summary but reads like a textbookExplains with intuition, adapts to audience
DepthSurface-level understandingUnderstands the methodCan discuss implementation details and math
Critique"It's a great paper" (no critique)Identifies 1 limitation3+ thoughtful limitations with proposed experiments
ExtensionNo ideas for follow-upGeneric ideas ("more data")Specific, feasible, novel extensions
Q&ACan't handle probing questionsAnswers with some depthReasons from first principles when unsure

Part 3 - The 10 Papers You Should Know

Papers That Come Up Most Often

PaperYearWhy It's AskedKey Concepts
Attention Is All You Need2017Foundation of modern AISelf-attention, multi-head attention, positional encoding
BERT2018Pre-training + fine-tuning paradigmMasked language modeling, next sentence prediction
GPT-32020In-context learning, scaling lawsFew-shot prompting, emergent abilities
LoRA2021Efficient fine-tuningLow-rank adaptation, parameter efficiency
InstructGPT / RLHF2022Alignment, human feedbackReward modeling, PPO, preference learning
RAG2020Retrieval-augmented generationCombining retrieval with generation
FlashAttention2022Efficient attentionIO-aware, tiling, SRAM utilization
DPO2023Simpler alignmentDirect preference optimization vs. RLHF
Mixture of ExpertsVariousEfficient scalingSparse activation, routing, expert specialization
Vision Transformers (ViT)2020Transformers for visionPatch embedding, position encoding for images

How Deep Should You Go?

For your chosen paper (the one you'll present): Read every section, understand the math, reimplement if possible.

For other papers: Understand the key contribution, 1-2 limitations, and how it relates to other work. You don't need to know the math.

Part 4 - Handling Follow-Up Questions

Common Follow-Up Patterns

Question TypeExampleHow to Handle
"Go deeper on X""Explain the attention computation step by step"Walk through the math: Q, K, V matrices, dot product, softmax, output
"What if you changed X?""What if you used cosine similarity instead of dot product?"Reason from first principles. "Cosine normalizes for magnitude, so..."
"Why not use Y instead?""Why not use RNNs for this task?"Compare trade-offs. "RNNs can't parallelize, which limits training speed..."
"How would you improve this?""How would you make this work for longer sequences?"Propose specific ideas: sparse attention, chunking, etc.
"What's the impact?""Has this paper actually changed practice?"Connect to downstream impact: "Every major LLM uses this..."

When You Don't Know the Answer

Script: "I haven't thought about that specific angle. Let me reason through it... [think out loud]. If we changed X, I'd expect Y because of Z. But I'm not certain - this would be a great experiment to run."

Practice Problems

Problem 1: Paper Critique

Present the "Attention Is All You Need" paper in 8 minutes. Then answer: "What are the three biggest limitations of the original transformer architecture?"

Full Answer + Rubric

Key limitations:

  1. Quadratic attention complexity: O(N²) in sequence length makes it prohibitive for long sequences. Led to FlashAttention, sparse attention, linear attention variants.
  2. Fixed context window: The model can only attend to a fixed number of tokens. Led to techniques like RoPE, ALiBi, and context extension methods.
  3. No explicit recurrence or memory: Each forward pass is independent - no way to carry state across sequences without external memory. Led to work on retrieval-augmented models and memory-augmented transformers.

Bonus: Position encoding was arbitrary (sinusoidal). Learned position embeddings, RoPE, and ALiBi all addressed this.

Scoring:

  • Strong Hire: 3+ specific limitations with references to follow-up work
  • Lean Hire: 1-2 correct limitations
  • No Hire: Can't identify any limitation

Interview Cheat Sheet

PhaseWhat to SayTime
Opening"I'd like to discuss [paper name], published in [year] by [lab]. It addresses the problem of..."0-1 min
Core idea"The key contribution is [specific technique], which works by..."1-4 min
Results"The main result shows [X]% improvement on [benchmark]. The most convincing experiment is..."4-6 min
Critique"However, there are limitations: [1], [2], [3]"6-8 min
Extension"If I were continuing this work, I'd explore..."8-10 min

Spaced Repetition Checkpoints

  • Day 0: Choose your presentation paper. Read it thoroughly.
  • Day 3: Present the paper to a friend in 10 minutes. Get feedback.
  • Day 7: Read 2 more papers from the must-know list. Write 3 limitations for each.
  • Day 14: Have someone quiz you with follow-up questions on your chosen paper.
  • Day 21: Present a different paper you've never discussed before. Can you do it well with less prep?

What's Next

© 2026 EngineersOfAI. All rights reserved.