DeepMind Interviews - The Complete Playbook
Reading time: ~40 min | Interview relevance: Critical | Roles: Research Scientist, Research Engineer, Staff Research Scientist, Applied Scientist
The Real Interview Moment
You are in a virtual interview room with two DeepMind researchers. One is a lead author on the landmark AlphaFold paper. The other specializes in reinforcement learning and has three best paper awards at ICML. The first researcher speaks: "We'd like you to present a paper of your choosing for 20 minutes. Not one of your own papers - a paper you find interesting that is relevant to our work. After your presentation, we will discuss it for 25 minutes. We are interested in your taste, your critical analysis, and your ability to identify the paper's strengths, weaknesses, and extensions."
You chose to present the Decision Transformer paper. You walk through the key insight - framing reinforcement learning as a sequence modeling problem - and explain why you find it compelling. Then the questions begin. "What are the fundamental limitations of this approach compared to classical RL?" "How does this connect to in-context learning in large language models?" "If you had unlimited compute, how would you extend this work? What experiment would you run first?" "Can you derive the loss function mathematically and show why it differs from a standard policy gradient approach?"
This is DeepMind's paper discussion round. It is not testing whether you read papers. It is testing whether you have research taste - the ability to evaluate, critique, and extend scientific work at the frontier. Every question probes deeper. Every answer reveals whether you think like a DeepMind researcher or simply consume research passively.
At DeepMind, the bar is not "can you do ML." The bar is "can you advance the field."
What You Will Master
- The complete DeepMind interview pipeline and how it differs from other AI labs
- Research expectations and the PhD question (is it required?)
- The paper discussion round - how to select, present, and defend a paper
- Mathematical rigor expectations and how to prepare
- Research taste - what it is and how to demonstrate it
- The Google integration dynamic and how it affects your work
- Team landscape: fundamental research, applied research, and engineering
- Compensation, career trajectory, and life at DeepMind
Part 1 - The DeepMind Interview Pipeline
Overview
DeepMind's interview process is one of the most selective in AI. Acceptance rates are estimated at 1-3% for research roles, comparable to the most competitive PhD programs. The process is thorough, research-focused, and designed to find people who will make fundamental contributions to AI.
Timeline
| Stage | Duration | Typical Wait After |
|---|---|---|
| Application to recruiter screen | 2-8 weeks | - |
| Recruiter screen | 30 min | 1-2 weeks |
| Research phone screen | 60 min | 2-3 weeks |
| Paper discussion / research presentation | 45 min | 2-3 weeks |
| Onsite loop | 1-2 days (4-6 rounds) | 2-4 weeks |
| Research committee | 2-4 weeks | 1-2 weeks |
| Team matching | 2-6 weeks | 1 week |
| Total | 12-24 weeks | - |
DeepMind's process is slow - often 3-6 months from application to offer. This is partly because of the research committee review process and partly because DeepMind is extremely selective. Do not interpret slow response times as rejection. Many successful candidates waited 3-4 weeks between stages. If you have competing deadlines, communicate them early.
Part 2 - Who DeepMind Hires
The PhD Question
Is a PhD required for DeepMind?
| Role | PhD Required? | What Substitutes |
|---|---|---|
| Research Scientist | Strongly preferred (90%+ have PhDs) | Exceptional publication record without PhD (very rare) |
| Senior/Staff Research Scientist | Effectively required | Nothing - these are senior researcher roles |
| Research Engineer | Not required | Strong engineering skills + ML research understanding |
| Applied Scientist | Preferred but not required | Industry experience deploying ML at scale |
| Software Engineer | Not required | Standard engineering skills |
The honest answer: For Research Scientist roles, a PhD from a strong program with publications at top venues (NeurIPS, ICML, ICLR, CVPR, ACL) is the standard path. DeepMind occasionally hires exceptional candidates without PhDs, but this is the exception, not the rule. For Research Engineer roles, a strong ML engineering background without a PhD is common.
What DeepMind Values in Candidates
| Quality | What It Looks Like | How Interviewers Test It |
|---|---|---|
| Research taste | Can identify important problems and promising approaches | Paper discussion round, research proposal questions |
| Mathematical rigor | Can derive, prove, and reason mathematically | Math and theory round, whiteboard derivations |
| Technical depth | Deep expertise in at least one area of AI/ML | Technical deep dive, publication discussion |
| Intellectual curiosity | Genuine excitement about open problems | Quality of questions, breadth of research awareness |
| Collaboration | Works well in research teams | Behavioral questions, team interaction during onsite |
| Communication | Can explain complex ideas clearly | Research presentation, paper discussion |
| Implementation ability | Can turn research ideas into working code | Coding round (yes, DeepMind has one) |
Part 3 - Stage-by-Stage Breakdown
Stage 1: Recruiter Screen (30 min)
What happens: A recruiter assesses basic fit and logistics.
DeepMind-specific details:
- The recruiter will ask about your research interests and how they align with DeepMind's work
- They will discuss locations (London is the primary hub; also Mountain View, Paris, Montreal)
- They will explain the process and set expectations about timeline
- If you have publications, they will note these for the research committee
Stage 2: Research Phone Screen (60 min)
What happens: A DeepMind researcher interviews you on ML fundamentals and research understanding.
This round is split approximately:
- 30 min: ML theory and fundamentals (deeper than Google or Meta)
- 15 min: Your research experience and interests
- 15 min: Discussion of open problems or recent research
DeepMind ML fundamentals go deeper than Big Tech:
Google/Meta phone screen: "Explain how attention mechanisms work."
DeepMind phone screen: "Derive the attention computation from first principles.
Why is it called 'attention' - what is the connection to information theory?
What is the computational complexity, and why does it matter?
How does multi-head attention differ mathematically from a single head
with the same total dimension?"
Topics that come up frequently:
| Topic | Expected Depth | Example Question |
|---|---|---|
| Optimization | Derive SGD, Adam; understand convergence theory | "Prove that SGD converges for convex functions. What changes for non-convex?" |
| Probability | Bayesian inference, variational methods, sampling | "Derive the ELBO. Why is it a lower bound? When is it tight?" |
| Information theory | KL divergence, mutual information, entropy | "What is the connection between KL divergence and maximum likelihood?" |
| Deep learning theory | Generalization, double descent, lottery ticket | "Why do overparameterized networks generalize? What does that tell us about the loss landscape?" |
| Reinforcement learning | Bellman equations, policy gradient, exploration | "Derive the policy gradient theorem. What is the variance problem and how do baselines help?" |
The key difference between a DeepMind phone screen and a Google phone screen is mathematical depth. At Google, explaining how Adam works conceptually is sufficient. At DeepMind, you should be able to derive the Adam update rule, explain the bias correction terms mathematically, and discuss why adaptive learning rates help on certain loss landscapes.
Stage 3: Paper Discussion / Research Presentation (45 min)
This is DeepMind's most distinctive interview round and the one that most clearly separates DeepMind from other companies.
Format options (varies by role and team):
Option A - Paper Discussion (most common for Research Scientist):
- You choose a paper to present (not your own)
- 20 min presentation
- 25 min discussion and Q&A
Option B - Research Presentation (common for Senior+ roles):
- You present your own research (1-2 papers)
- 25-30 min presentation
- 15-20 min Q&A
How to Select a Paper for the Paper Discussion:
| Selection Criterion | Good Choice | Bad Choice |
|---|---|---|
| Relevance | Related to DeepMind's research areas | A paper in a completely unrelated field |
| Depth | Has interesting mathematical or theoretical content | A purely empirical paper with no theory |
| Recency | Published in the last 2-3 years | A textbook result from 20 years ago |
| Non-obvious | A paper that most people have not read | The most famous paper in the field (GPT, AlphaGo) |
| Debatable | Has clear strengths AND weaknesses | A paper you think is perfect (no room for discussion) |
| Extendable | You can propose interesting extensions | A closed result with no obvious next steps |
Do not choose a DeepMind paper to present. Interviewers know their own work intimately, and presenting it back to them is awkward and risky. Similarly, do not choose the most famous paper in the field (Attention Is All You Need, AlphaGo, etc.) - everyone presents these, and you will not differentiate yourself. Choose a paper that is excellent but not obvious, relevant but not from DeepMind.
How to present the paper:
What interviewers are evaluating:
- Research taste: Did you choose an interesting paper? Can you articulate why it matters?
- Critical analysis: Can you identify the paper's strengths AND weaknesses?
- Depth of understanding: Do you understand the math, not just the concepts?
- Scientific judgment: Can you evaluate the experimental methodology?
- Creativity: Can you propose interesting extensions or variations?
- Communication: Can you explain complex ideas clearly and concisely?
Common paper discussion follow-up questions:
- "What is the strongest claim this paper makes? Is it justified?"
- "If you had to replicate this work, what would be the hardest part?"
- "What experiment would you run that the authors did not?"
- "How does this relate to [seemingly unrelated topic]?"
- "If the authors' assumptions are wrong, what breaks?"
- "Can you derive [key equation] on the whiteboard?"
Many candidates prepare a polished presentation but cannot handle follow-up questions that go deeper than the paper's content. The presentation is 20 minutes; the discussion is 25 minutes. Prepare for the discussion by: re-deriving all key equations, identifying 3 weaknesses, proposing 2 extensions, and understanding how the paper connects to the broader research landscape.
Stage 4: Onsite Loop (4-6 Rounds)
The DeepMind onsite for Research Scientist roles:
| Round | Duration | Type | What It Tests |
|---|---|---|---|
| Round 1 | 60 min | Coding | Implementation ability, algorithms |
| Round 2 | 60 min | ML Theory / Math | Mathematical rigor, derivations |
| Round 3 | 60 min | Research Depth | Deep expertise in your research area |
| Round 4 | 60 min | Research Breadth | Awareness of ML landscape, connections between fields |
| Round 5 | 45-60 min | Collaboration / Behavioral | Teamwork, communication, research mentality |
| Round 6 (Senior+) | 60 min | Research Vision | Can you lead research direction? |
Part 4 - Technical Rounds in Detail
Coding Round
DeepMind coding rounds have unique characteristics:
- Problems often have a research flavor - implementing algorithms from papers, efficient computation of mathematical quantities
- Python is the standard language; JAX/NumPy fluency is valued
- The bar is lower than Google SWE but still significant
- Clean, readable code matters - your colleagues will read and build on your code
Common coding problem types at DeepMind:
| Type | Example | Research Connection |
|---|---|---|
| Algorithm implementation | Implement beam search with diverse decoding | Used in sequence models |
| Matrix operations | Efficiently compute attention scores with masking | Core to transformer implementation |
| Dynamic programming | Find optimal policy in a grid world with constraints | RL foundations |
| Graph algorithms | Message passing on an arbitrary graph | Graph neural networks |
| Sampling | Implement Metropolis-Hastings sampling | Bayesian ML |
| Optimization | Implement gradient descent with momentum and learning rate schedule | Training loops |
Math and Theory Round
This is where DeepMind interviews diverge most sharply from Big Tech. You may be asked to:
- Derive loss functions from first principles
- Prove convergence bounds
- Work through variational inference derivations
- Analyze computational complexity of ML algorithms
- Connect information theory to ML concepts
Example questions:
Probability and Statistics:
- "Derive Bayes' theorem. Now derive the posterior for a Gaussian likelihood with a Gaussian prior."
- "What is the connection between maximum likelihood estimation and KL divergence minimization?"
- "Explain the reparameterization trick used in VAEs. Why is it necessary? Derive it."
Optimization:
- "Why does batch normalization help training? Give a mathematical argument."
- "Derive the natural gradient. How does it differ from standard gradient descent? When does it matter?"
- "What is the connection between Adam and natural gradient methods?"
Deep Learning Theory:
- "What is the Neural Tangent Kernel? What does it tell us about training dynamics?"
- "Explain the lottery ticket hypothesis. What are its implications for model compression?"
- "Why do large models generalize despite being overparameterized? Discuss the double descent phenomenon."
Reinforcement Learning:
- "Derive the policy gradient theorem step by step."
- "What is the bias-variance trade-off in TD learning vs Monte Carlo returns?"
- "Explain model-based RL. When does it outperform model-free methods? What are the failure modes?"
The math round at DeepMind is significantly harder than at any other company in this guide. Google and Meta expect you to understand ML concepts and explain them clearly. DeepMind expects you to derive them from scratch on a whiteboard. If you are coming from an industry background without recent mathematical practice, budget extra preparation time for this round.
Research Depth Round
This round explores your area of expertise in extreme detail:
- If your thesis is on graph neural networks, expect 60 minutes of graph neural network questions - from foundational theory to cutting-edge results
- The interviewers will likely be experts in your area (or adjacent areas)
- They will push you to the boundary of current knowledge
- They want to see that you have genuine expertise, not surface-level familiarity
How depth probes work at DeepMind:
Area: Reinforcement Learning
Level 1: "What is the difference between on-policy and off-policy RL?"
Level 2: "Derive the importance sampling correction for off-policy evaluation."
Level 3: "What are the variance issues with importance sampling as the behavior
and target policies diverge? How do methods like V-trace address this?"
Level 4: "How does the choice of off-policy correction interact with function
approximation? When does the deadly triad manifest?"
Level 5: "Given unlimited compute, how would you design an RL system that
avoids the deadly triad while maintaining off-policy sample efficiency?
What trade-offs are you making?"
Research Breadth Round
This round tests your awareness of the broader ML landscape:
- Can you connect ideas across different areas of ML?
- Are you aware of recent important results outside your specialty?
- Can you evaluate whether a research direction is promising?
- Do you read broadly, not just in your niche?
Example questions:
- "What are the most important open problems in AI right now?"
- "How does the scaling laws research inform how we should think about model development?"
- "What is the connection between diffusion models and score matching? How does this relate to variational inference?"
- "If you had to start a new research project tomorrow, what would you work on and why?"
- "A colleague proposes training a 1T parameter model. What questions would you ask before investing compute?"
Part 5 - DeepMind Team Landscape
Research Areas
Team Comparison
| Team Area | Research Freedom | Publication Rate | Google Product Impact | Interview Focus |
|---|---|---|---|---|
| Fundamental Research | Very High | Very High | Indirect | Deep theory, math, research taste |
| AI for Science | High | High (Nature, Science) | Low | Domain knowledge + ML |
| Language/Multimodal (Gemini) | Medium | Medium (some restricted) | Very High | LLM depth, scaling, evaluation |
| Safety & Alignment | High | High | Growing | Alignment theory, evaluation, interpretability |
| Research Engineering | Medium | Low | High | Systems engineering, ML infrastructure |
Part 6 - The Google Integration Dynamic
What Changed After the Merger
Google Brain and DeepMind merged into "Google DeepMind" in 2023. This affects your interview and career:
How it affects interviews:
- The interview process is still largely DeepMind-style for DeepMind-branded roles
- Some roles are now shared with Google - check whether you are interviewing for a "Google DeepMind" role or a "Google" role with DeepMind collaboration
- Team matching may include both legacy DeepMind and legacy Brain teams
- Compensation follows Google's band system
How it affects day-to-day work:
| Dimension | Pre-Merger DeepMind | Post-Merger Google DeepMind |
|---|---|---|
| Publication freedom | Very high | Still high, but some restrictions for Gemini-related work |
| Research autonomy | Very high | Still high, but more product pressure |
| Infrastructure | DeepMind's own systems | Google infrastructure (TPUs, etc.) |
| Compute access | Good | Better (Google's resources) |
| Product pressure | Low | Medium (Gemini is a priority) |
| Compensation | DeepMind-specific | Google bands (generally equivalent or better) |
The Google DeepMind merger means that some roles that were previously "pure research" now have product expectations (particularly anything related to Gemini). If you are interviewing for a Gemini-related role, expect questions about productionization, latency, and serving - not just research. If you are interviewing for a fundamental research role (RL, neuroscience, theory), the interview remains heavily research-focused.
Part 7 - Level Expectations and Compensation
DeepMind / Google DeepMind Levels
Post-merger, DeepMind uses Google's level system:
| Level | Title | Typical Background | Scope | Interview Bar |
|---|---|---|---|---|
| L4 | Research Engineer | MS + strong coding | Implement research, build infrastructure | Strong coding, ML understanding |
| L5 | Research Scientist / Senior RE | PhD + publications | Conduct independent research | Research depth, mathematical rigor, coding |
| L6 | Senior Research Scientist | PhD + strong record | Lead research direction for a team | Research vision, significant publications |
| L7 | Staff Research Scientist | PhD + exceptional record | Define research agenda for an area | Industry-leading expertise, major publications |
| L8+ | Principal / Distinguished | World-class reputation | Shape the field | Extraordinary contributions |
2025/2026 Google DeepMind Compensation (US / UK)
US (Mountain View):
| Level | Base Salary | Stock (Annual) | Bonus | Total Comp (Annual) |
|---|---|---|---|---|
| L4 | $155-195K | $80-150K | 15% | $290-410K |
| L5 | $195-260K | $160-300K | 15-20% | $420-640K |
| L6 | $260-330K | $300-550K | 20-25% | $640-980K |
| L7 | $330-420K | $550K-1M+ | 25-30% | $1M-1.5M+ |
UK (London):
| Level | Base Salary | Stock (Annual) | Bonus | Total Comp (Annual) |
|---|---|---|---|---|
| L4 | GBP 70-90K | GBP 30-60K | 15% | GBP 115-170K |
| L5 | GBP 90-120K | GBP 60-120K | 15-20% | GBP 170-270K |
| L6 | GBP 120-160K | GBP 120-220K | 20-25% | GBP 275-430K |
| L7 | GBP 160-210K | GBP 220-400K+ | 25-30% | GBP 430-700K+ |
Key compensation details:
- UK compensation is lower in absolute terms but competitive for London tech salaries
- Stock follows Google's 4-year vesting schedule (33% Year 1, then monthly)
- Annual refresher grants are significant, especially at L6+
- DeepMind roles sometimes receive signing bonuses of $50-200K+
- Relocation support is generous for moves between London, Mountain View, and other offices
If you are deciding between DeepMind London and DeepMind Mountain View, consider: London offers a lower cost of living (relative to Bay Area), a stronger research community density (most DeepMind researchers are in London), and a more established research culture. Mountain View offers higher absolute compensation and closer integration with Google product teams.
Part 8 - Common Mistakes and How to Avoid Them
The Top 10 DeepMind Interview Mistakes
| Mistake | Why It Happens | How to Avoid |
|---|---|---|
| 1. Choosing a DeepMind paper for the paper discussion | Wanting to show alignment | Choose a great non-DeepMind paper that connects to their work |
| 2. Surface-level math | Industry background without recent math practice | Re-derive key results from scratch; practice whiteboard math |
| 3. No research opinion | Fear of being wrong | Have opinions about open problems and defend them |
| 4. Presenting too many papers | Wanting to show breadth | Go deep on one paper rather than shallow on three |
| 5. Cannot code | Pure theorist background | Practice Python/JAX/NumPy implementation of ML algorithms |
| 6. No weaknesses identified in paper discussion | Wanting to seem positive | Every paper has weaknesses; identifying them shows critical thinking |
| 7. Not connecting to broader landscape | Narrow focus | Read broadly; connect your work to 2-3 other research areas |
| 8. Treating it like a Big Tech interview | Over-preparing for LeetCode | Shift preparation toward math, theory, and research discussion |
| 9. Not understanding DeepMind's mission | Applying broadly | DeepMind's mission is "solving intelligence to advance science" - articulate how your work aligns |
| 10. Expecting fast decisions | Accustomed to startup speed | DeepMind takes 3-6 months; be patient and communicate competing timelines |
What Ex-DeepMind Interviewers Say
"The paper discussion round is where most candidates succeed or fail. The ones who fail present a paper like a textbook summary - accurate but lifeless. The ones who succeed present a paper like a research collaborator - with opinions, critiques, and ideas for extensions. I want to see that you have research taste, not just research knowledge."
"Mathematical rigor is non-negotiable at DeepMind. I have seen candidates who are excellent ML engineers - they can build anything - but they cannot derive the loss function they are optimizing. At DeepMind, understanding why something works is as important as making it work."
"The question I ask myself after every interview is: 'If I gave this person a research problem with no clear solution, would they make progress?' Some candidates are excellent at executing well-defined plans but struggle with ambiguity. At DeepMind, most of the important work starts with ambiguity."
Part 9 - Preparation Strategies
The 6-Week DeepMind Prep Plan
DeepMind interviews require more preparation than typical Big Tech interviews, especially for the math and research components.
Weeks 1-2: Mathematical Foundations
- Re-derive key results: SGD convergence, policy gradient theorem, ELBO, attention computation
- Review probability theory: Bayesian inference, conjugate priors, variational methods
- Review linear algebra: eigenvalues, SVD, matrix calculus
- Review optimization theory: convexity, convergence rates, natural gradient
- Practice whiteboard math: explain derivations out loud while writing
Week 3: Research Depth
- Go extremely deep in your specialty area (5-7 levels of depth)
- Re-read the 10 most important papers in your area
- Identify 3 open problems you have opinions about
- Prepare to discuss your own research for 30 minutes with detailed Q&A
Week 4: Paper Discussion Preparation
- Select your paper for the paper discussion round
- Prepare a 20-minute presentation with clear slides or whiteboard plan
- Identify 5 strengths and 5 weaknesses of the paper
- Prepare 3 extensions or future work ideas
- Practice presenting to colleagues and handling tough questions
Week 5: Coding and Breadth
- Solve 30 coding problems with research flavor (algorithm implementation, matrix operations)
- Practice coding in Python with NumPy/JAX
- Read 10 papers outside your specialty to build breadth
- Practice connecting ideas across research areas
Week 6: Integration and Mock Interviews
- 2 full mock interviews in DeepMind format
- Practice the paper discussion round with researchers if possible
- Review your weakest areas (math, coding, or research breadth)
- Research DeepMind teams and identify which ones interest you
- Prepare questions for interviewers
DeepMind Interview Preparation Checklist
6 Weeks Out
- Re-derive 15 key ML results from first principles
- Review probability, linear algebra, optimization theory
- Select paper for paper discussion round
- Identify 3 open problems in your research area with your own opinions
- Begin reading broadly (10 papers outside your specialty)
3 Weeks Out
- Prepare 20-min paper presentation with strengths, weaknesses, extensions
- Practice whiteboard math derivations
- Solve 30 research-flavored coding problems
- Prepare to discuss your own research for 30 minutes
- Read DeepMind's recent publications relevant to your target team
1 Week Out
- Do 2 full mock interviews (paper discussion + math + coding + research depth)
- Practice presenting your chosen paper to colleagues
- Review your research narrative: why this area, why DeepMind, what will you work on?
- Research DeepMind teams and team leads
- Prepare thoughtful questions for interviewers
Day Before
- Light review of your paper presentation
- Review key derivations one final time
- Do not cram - trust your preparation
- Get 8 hours of sleep
Day Of
- Arrive early (onsite) or test your setup (virtual)
- Bring a notebook for sketching during discussions
- Be genuinely curious - ask questions that interest you
- Show excitement about research - DeepMind wants passionate researchers
- Remember: they want you to succeed; the interview is a research conversation, not an interrogation
Part 10 - Sample Questions and Answers
Paper Discussion Sample
Paper chosen: "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" (Power et al., 2022)
Presentation highlights (20 min):
"I chose this paper because it challenges a fundamental assumption in deep learning - that models either generalize or overfit, and you can tell which from the training dynamics. Grokking shows that models can first memorize (overfit), and then, with much more training, suddenly generalize - long after the training loss has plateaued.
The paper demonstrates this on small algorithmic tasks like modular arithmetic. The model perfectly memorizes the training set early in training, then shows sudden generalization on the test set thousands of epochs later.
Strengths: (1) The phenomenon is robust across architectures and tasks. (2) The paper provides clean, reproducible experiments. (3) It opens a genuinely new research direction - understanding delayed generalization.
Weaknesses: (1) The tasks are small and synthetic - it is unclear whether grokking occurs on natural datasets at scale. (2) The paper does not provide a mechanistic explanation for why grokking happens. (3) The practical implications are unclear - should practitioners always train longer?
Extensions I would pursue: (1) Test whether grokking occurs in fine-tuning of LLMs on small datasets. (2) Use mechanistic interpretability to understand what changes in the network during the grokking transition. (3) Investigate the connection between grokking and the lottery ticket hypothesis - does grokking involve finding a sparse subnetwork?"
Follow-up questions and answers:
Q: "What is the most compelling mechanistic explanation for grokking that has been proposed since this paper?"
A: "Nanda et al. showed through mechanistic interpretability that the network learns a clean modular arithmetic algorithm, but initially the memorization circuit dominates. With continued training and weight decay, the memorization circuit decays and the generalizing circuit takes over. Weight decay appears to be important - without it, grokking is less likely."
Math Round Sample
Question: "Derive the ELBO (Evidence Lower Bound) and explain why it is a lower bound on the log evidence."
Expected derivation approach:
"We want to compute log p(x), the log evidence. We introduce a variational distribution q(z|x) and write:
log p(x) = log integral of p(x,z) dz
We multiply and divide by q(z|x):
log p(x) = log integral of [p(x,z)/q(z|x)] * q(z|x) dz
By Jensen's inequality (since log is concave):
log p(x) >= integral of q(z|x) * log[p(x,z)/q(z|x)] dz
This is the ELBO. We can rewrite it as:
ELBO = E_q[log p(x|z)] - KL(q(z|x) || p(z))
The first term is the expected reconstruction, and the second is the KL divergence between the approximate posterior and the prior.
Why it is a lower bound: The gap between log p(x) and the ELBO is exactly KL(q(z|x) || p(z|x)) - the KL divergence between the approximate and true posterior. Since KL divergence is non-negative, the ELBO is always a lower bound. It is tight when q(z|x) = p(z|x), the true posterior."
Next Steps
DeepMind interviews are the most research-intensive in the AI industry. If you are a researcher with deep theoretical foundations and genuine research taste, DeepMind offers the opportunity to work on some of the hardest and most important problems in AI.
Next, see how all the top AI companies compare side-by-side across every dimension that matters: Company Comparison Matrix.
