Paper Discussion Round - Show Your Research Taste
Reading time: ~16 min | Interview relevance: Critical (RE), High (MLE) | Roles: RE, some MLE
The Real Interview Moment
"Tell me about a paper you've read recently that you found interesting."
You pick a paper. You spend 10 minutes summarizing every section. The interviewer interrupts: "What's the key limitation of this approach?" You stammer - you focused on understanding the paper, not critiquing it. The interviewer follows up: "If you had unlimited compute and data, what experiment would you run to improve on this?" You don't have an answer.
The paper discussion round tests research taste, not reading speed. Can you identify what matters, what doesn't, and what comes next?
What You Will Master
- How to structure a paper presentation in 8-10 minutes
- What interviewers are actually evaluating (it's not summarization)
- How to critique papers constructively
- How to handle follow-up questions you haven't prepared for
- The 10 papers you should be ready to discuss
Part 1 - The Presentation Framework
The 5-Part Structure (8-10 minutes)
- Problem (1 min): What problem does this paper solve? Why does it matter? What existed before?
- Core Contribution (2-3 min): What's the key technical idea? Explain the method clearly. Use a diagram if helpful.
- Key Experiments (2 min): What are the main results? Are they convincing? What's the most important table/figure?
- Limitations (1-2 min): What doesn't this paper address? Where might it fail? What assumptions does it make?
- Extensions (1-2 min): What would you do next? What experiment would test the limitations? How could this be improved?
The single strongest signal in a paper discussion is unprompted limitations analysis. When a candidate says "This paper is great, but here are three things it doesn't address..." - that tells me they have real research taste. Summarization shows reading ability. Critique shows thinking ability.
Part 2 - What Gets Scored
| Criterion | No Hire | Lean Hire | Strong Hire |
|---|---|---|---|
| Clarity | Can't explain the paper clearly | Clear summary but reads like a textbook | Explains with intuition, adapts to audience |
| Depth | Surface-level understanding | Understands the method | Can discuss implementation details and math |
| Critique | "It's a great paper" (no critique) | Identifies 1 limitation | 3+ thoughtful limitations with proposed experiments |
| Extension | No ideas for follow-up | Generic ideas ("more data") | Specific, feasible, novel extensions |
| Q&A | Can't handle probing questions | Answers with some depth | Reasons from first principles when unsure |
Part 3 - The 10 Papers You Should Know
Papers That Come Up Most Often
| Paper | Year | Why It's Asked | Key Concepts |
|---|---|---|---|
| Attention Is All You Need | 2017 | Foundation of modern AI | Self-attention, multi-head attention, positional encoding |
| BERT | 2018 | Pre-training + fine-tuning paradigm | Masked language modeling, next sentence prediction |
| GPT-3 | 2020 | In-context learning, scaling laws | Few-shot prompting, emergent abilities |
| LoRA | 2021 | Efficient fine-tuning | Low-rank adaptation, parameter efficiency |
| InstructGPT / RLHF | 2022 | Alignment, human feedback | Reward modeling, PPO, preference learning |
| RAG | 2020 | Retrieval-augmented generation | Combining retrieval with generation |
| FlashAttention | 2022 | Efficient attention | IO-aware, tiling, SRAM utilization |
| DPO | 2023 | Simpler alignment | Direct preference optimization vs. RLHF |
| Mixture of Experts | Various | Efficient scaling | Sparse activation, routing, expert specialization |
| Vision Transformers (ViT) | 2020 | Transformers for vision | Patch embedding, position encoding for images |
How Deep Should You Go?
For your chosen paper (the one you'll present): Read every section, understand the math, reimplement if possible.
For other papers: Understand the key contribution, 1-2 limitations, and how it relates to other work. You don't need to know the math.
Part 4 - Handling Follow-Up Questions
Common Follow-Up Patterns
| Question Type | Example | How to Handle |
|---|---|---|
| "Go deeper on X" | "Explain the attention computation step by step" | Walk through the math: Q, K, V matrices, dot product, softmax, output |
| "What if you changed X?" | "What if you used cosine similarity instead of dot product?" | Reason from first principles. "Cosine normalizes for magnitude, so..." |
| "Why not use Y instead?" | "Why not use RNNs for this task?" | Compare trade-offs. "RNNs can't parallelize, which limits training speed..." |
| "How would you improve this?" | "How would you make this work for longer sequences?" | Propose specific ideas: sparse attention, chunking, etc. |
| "What's the impact?" | "Has this paper actually changed practice?" | Connect to downstream impact: "Every major LLM uses this..." |
When You Don't Know the Answer
Script: "I haven't thought about that specific angle. Let me reason through it... [think out loud]. If we changed X, I'd expect Y because of Z. But I'm not certain - this would be a great experiment to run."
Practice Problems
Problem 1: Paper Critique
Present the "Attention Is All You Need" paper in 8 minutes. Then answer: "What are the three biggest limitations of the original transformer architecture?"
Full Answer + Rubric
Key limitations:
- Quadratic attention complexity: O(N²) in sequence length makes it prohibitive for long sequences. Led to FlashAttention, sparse attention, linear attention variants.
- Fixed context window: The model can only attend to a fixed number of tokens. Led to techniques like RoPE, ALiBi, and context extension methods.
- No explicit recurrence or memory: Each forward pass is independent - no way to carry state across sequences without external memory. Led to work on retrieval-augmented models and memory-augmented transformers.
Bonus: Position encoding was arbitrary (sinusoidal). Learned position embeddings, RoPE, and ALiBi all addressed this.
Scoring:
- Strong Hire: 3+ specific limitations with references to follow-up work
- Lean Hire: 1-2 correct limitations
- No Hire: Can't identify any limitation
Interview Cheat Sheet
| Phase | What to Say | Time |
|---|---|---|
| Opening | "I'd like to discuss [paper name], published in [year] by [lab]. It addresses the problem of..." | 0-1 min |
| Core idea | "The key contribution is [specific technique], which works by..." | 1-4 min |
| Results | "The main result shows [X]% improvement on [benchmark]. The most convincing experiment is..." | 4-6 min |
| Critique | "However, there are limitations: [1], [2], [3]" | 6-8 min |
| Extension | "If I were continuing this work, I'd explore..." | 8-10 min |
Spaced Repetition Checkpoints
- Day 0: Choose your presentation paper. Read it thoroughly.
- Day 3: Present the paper to a friend in 10 minutes. Get feedback.
- Day 7: Read 2 more papers from the must-know list. Write 3 limitations for each.
- Day 14: Have someone quiz you with follow-up questions on your chosen paper.
- Day 21: Present a different paper you've never discussed before. Can you do it well with less prep?
What's Next
- For full paper discussion prep → Paper Discussion
- Behavioral Round - The soft skills round
- Take-Home Assessment - Practical project evaluation
