Skip to main content

LLM Interviews - The Complete 2026 Preparation Roadmap

Reading time: ~30 min | Interview relevance: Critical | Roles: MLE, AI Eng, LLM Eng, Research Eng, Applied Scientist

The Real Interview Moment

You are sitting in the first round of a Series B AI startup interview. The hiring manager leans forward and says: "We get 500 applications a week from people who say they have LLM experience. Most of them have called an API and written some prompts. Walk me through how you would build our fine-tuned model from scratch - from data collection to production deployment with guardrails."

This is the reality of LLM interviews in 2026. Every software engineer's resume now lists "LLM experience." The bar has shifted dramatically. Interviewers no longer ask "What is a Transformer?" - they ask "Why does LLaMA 3 use GQA instead of MHA, and what is the memory savings at 128K context length?" They do not want API callers. They want engineers who understand the full stack from pretraining data curation to inference optimization.

This section is your complete preparation guide. It covers 11 interconnected topics, each with the depth expected at top AI labs and LLM-focused startups.

What You Will Master

  • Map the complete LLM interview landscape across 11 core topics
  • Assess your current level and identify the highest-ROI study areas
  • Choose the right study path for your target role and timeline
  • Understand what separates a "strong hire" from "everyone else" in 2026
  • Track your preparation progress with spaced repetition checkpoints

Self-Assessment: Where Are You Now?

Rate yourself honestly on each topic. This is your starting point - you will reassess after studying each chapter.

#Topic1 -- Never Seen2 -- Read About3 -- Can Explain4 -- Can Derive/Build5 -- Can TeachYour Score
1Transformer Internals for LLMs___
2LLM Pretraining___
3Fine-Tuning (LoRA, QLoRA, Adapters)___
4RLHF and Alignment___
5RAG Systems___
6Prompt Engineering___
7LLM Evaluation___
8Inference Optimization___
9Agent Architectures___
10Safety and Guardrails___
11LLM Interview Questions (Capstone)___

Scoring guide:

  • 40+ total: You are well-prepared. Focus on weak spots and practice under time pressure.
  • 25-39: Solid foundation. Work through the chapters in dependency order.
  • Under 25: Start from Chapter 1 and work sequentially. Allow 4-6 weeks.

Why LLM Interviews Are Different in 2026

The "Everyone Claims LLM Experience" Problem

In 2023, listing "LLM experience" on your resume was a differentiator. By 2026, it is table stakes - and the signal-to-noise ratio has collapsed. Here is what changed:

LLM Experience Evolution

Instant Rejection

Saying "I built an LLM application" when you mean "I called the OpenAI API with a system prompt" will end your interview. Interviewers in 2026 probe immediately: "What model did you use? Why? What was your evaluation framework? How did you handle hallucinations?"

What Top Companies Actually Test

The interview landscape has stratified into distinct tiers:

TierCompaniesWhat They TestDepth Expected
Tier 1 - Frontier LabsAnthropic, OpenAI, Google DeepMind, Meta FAIRPretraining, architecture research, alignment theoryCan derive from first principles, propose novel approaches
Tier 2 - LLM InfrastructureDatabricks, Anyscale, Modal, Together AI, FireworksTraining infrastructure, inference optimization, servingCan build training pipelines, optimize serving stacks
Tier 3 - AI-Native ProductsCursor, Replit, Notion AI, Harvey, GleanRAG, agents, evaluation, fine-tuning for domainCan build end-to-end LLM features, measure quality
Tier 4 - Enterprise AIBig tech AI teams, consulting, financeRAG, prompt engineering, safety, cost optimizationCan deploy reliably at scale with guardrails
Company Variation

Anthropic and OpenAI will ask you to derive attention complexity from scratch. A Series A startup building an AI code editor will ask you to design a RAG pipeline that handles 50K-file codebases. Both are "LLM interviews" but they test completely different depths.

The Five Competencies Interviewers Probe

Every LLM interview question maps to one or more of these competencies:

LLM Engineering Competencies

Part 1 - The 11-Topic Roadmap

Topic Dependency Diagram

The topics in this section are not independent. Study them in dependency order to build understanding layer by layer:

Topic Dependency Map

Legend: Red = foundational (start here) | Yellow = core training topics | Blue = application layer | Green = advanced integration

Topic-by-Topic Summary

Chapter 1: Transformer Internals for LLMs

Why it matters: Every other topic builds on this. You cannot discuss pretraining, fine-tuning, or inference optimization without understanding the architecture.

Key concepts: Decoder-only architecture, causal masking, RoPE positional encoding, Grouped Query Attention (GQA), SwiGLU FFN, RMSNorm, KV cache mechanics, parameter counting, FLOP estimation.

Interview frequency: Asked in 95% of LLM interviews. Frontier labs expect derivation-level depth.

Chapter 2: LLM Pretraining

Why it matters: Understanding pretraining separates engineers who can build foundation models from those who only consume them.

Key concepts: Data collection and filtering pipelines, tokenization (BPE, SentencePiece, tiktoken), training objectives (causal LM, prefix LM, fill-in-the-middle), scaling laws (Chinchilla, inference-aware), compute budgets, 3D parallelism, checkpointing, fault tolerance.

Interview frequency: Asked at Tier 1 and Tier 2 companies. Tier 3 asks lighter versions focused on data quality.

Chapter 3: Fine-Tuning LLMs

Why it matters: The most practical interview topic. Every company that uses LLMs has fine-tuning decisions to make.

Key concepts: Full fine-tuning, LoRA/QLoRA math, prefix tuning, adapter layers, instruction tuning, data formatting, quality vs quantity tradeoffs, when to fine-tune vs prompt engineer vs RAG, catastrophic forgetting, cost analysis.

Interview frequency: Asked in 85% of LLM interviews. Expect to compare approaches with concrete cost numbers.

Chapter 4: RLHF and Alignment

Why it matters: This is what makes raw language models into useful assistants. Alignment is the hottest research area in AI.

Key concepts: Reward model training, PPO for LLMs, DPO and its variants, Constitutional AI, RLAIF, preference data collection, reward hacking, alignment tax.

Interview frequency: Critical at frontier labs. Tier 3-4 companies ask conceptual questions.

Chapter 5: RAG Systems

Why it matters: RAG is the most deployed LLM pattern in production. If you are interviewing at any company building LLM products, expect RAG questions.

Key concepts: Chunking strategies, embedding models, vector databases, hybrid search, reranking, query transformation, multi-hop RAG, evaluation (faithfulness, relevance, recall).

Interview frequency: Asked in 90% of applied AI interviews. System design rounds often center on RAG.

Chapter 6: Prompt Engineering

Why it matters: The gap between amateur and expert prompting is enormous. Companies need engineers who can systematically optimize prompts.

Key concepts: Chain-of-thought, few-shot design, system prompt architecture, structured outputs, prompt injection defense, A/B testing prompts, prompt versioning.

Interview frequency: Asked everywhere, but depth varies. Frontier labs test understanding of why techniques work.

Chapter 7: LLM Evaluation

Why it matters: "How do you know it works?" is the question that separates production engineers from demo builders.

Key concepts: Perplexity and its limitations, benchmark suites (MMLU, HumanEval, MT-Bench), human evaluation design, LLM-as-judge, contamination detection, domain-specific eval, A/B testing in production.

Interview frequency: Increasingly common. Every serious company asks about evaluation strategy.

Chapter 8: Inference Optimization

Why it matters: Serving LLMs at scale is expensive. Companies need engineers who can reduce latency and cost by 10x.

Key concepts: KV cache optimization, continuous batching, speculative decoding, quantization (GPTQ, AWQ, GGUF), PagedAttention/vLLM, tensor parallelism for serving, prefill vs decode optimization.

Interview frequency: Critical at Tier 2 infrastructure companies. Asked at all tiers for senior roles.

Chapter 9: Agent Architectures

Why it matters: Agents are the frontier of LLM applications. Companies are racing to build reliable autonomous systems.

Key concepts: ReAct pattern, tool use, planning and decomposition, memory systems, multi-agent coordination, error recovery, evaluation of agent systems.

Interview frequency: Growing rapidly. Most common at AI-native product companies.

Chapter 10: Safety and Guardrails

Why it matters: No production LLM ships without safety. Regulatory pressure is increasing globally.

Key concepts: Prompt injection and jailbreaks, output filtering, content classification, constitutional approaches, red teaming, safety benchmarks, regulatory compliance (EU AI Act).

Interview frequency: Asked at every company deploying LLMs to users. Frontier labs go deep on alignment theory.

Chapter 11: LLM Interview Questions (Capstone)

Why it matters: Integrative questions that span multiple topics, simulating real interview pressure.

Key concepts: Cross-topic system design, rapid-fire concept questions, debugging scenarios, paper discussion, whiteboard architecture.

Interview frequency: This IS the interview.

Part 2 - Study Paths by Role and Timeline

Path Selection Guide

Study Path Selection

Detailed Study Paths

The Deep Path (Research Engineer / Scientist) - 6 weeks

Target companies: Anthropic, OpenAI, Google DeepMind, Meta FAIR

WeekTopicsFocus
1Ch 1: Transformer InternalsDerive attention, implement from scratch, parameter counting
2Ch 2: PretrainingScaling laws derivation, data pipeline design, training infrastructure
3Ch 4: RLHF and AlignmentReward modeling math, DPO derivation, alignment research landscape
4Ch 3: Fine-Tuning + Ch 7: EvaluationLoRA theory, benchmark design, contamination
5Ch 10: Safety + Ch 8: InferenceSafety-alignment connection, efficient inference theory
6Ch 11: Capstone + Mock InterviewsTimed practice, paper discussions
Interviewer's Perspective

At frontier labs, we expect candidates to go beyond reciting facts. We want to hear you reason about tradeoffs, propose experiments, and identify limitations in existing approaches. Practice explaining your reasoning out loud.

The Full Stack Path (MLE / LLM Engineer) - 5 weeks

Target companies: Databricks, Scale AI, Cohere, AI startups with training pipelines

WeekTopicsFocus
1Ch 1: Transformer InternalsArchitecture comparison, KV cache math, memory estimation
2Ch 2: Pretraining + Ch 3: Fine-TuningEnd-to-end training, LoRA implementation, data pipelines
3Ch 4: RLHF + Ch 8: InferencePost-training pipeline, serving optimization
4Ch 5: RAG + Ch 7: EvaluationProduction RAG, evaluation frameworks
5Ch 11: Capstone + Mock InterviewsSystem design, cross-topic questions

The Applied Path (AI / Applied Engineer) - 4 weeks

Target companies: Cursor, Notion AI, Harvey, Glean, enterprise AI teams

WeekTopicsFocus
1Ch 1: Transformer Internals (lighter) + Ch 3: Fine-TuningPractical architecture knowledge, when/how to fine-tune
2Ch 5: RAG + Ch 6: Prompt EngineeringProduction RAG design, systematic prompting
3Ch 9: Agents + Ch 7: EvaluationAgent architectures, measuring quality
4Ch 10: Safety + Ch 11: CapstoneGuardrails, end-to-end system design

The Infra Path (ML Platform / Infra Engineer) - 4 weeks

Target companies: Together AI, Fireworks, Modal, Anyscale, cloud AI teams

WeekTopicsFocus
1Ch 1: Transformer Internals + Ch 8: InferenceMemory math, KV cache, quantization, serving frameworks
2Ch 2: Pretraining3D parallelism, FSDP, checkpointing, fault tolerance
3Ch 5: RAG + Ch 9: AgentsInfrastructure for retrieval and agent systems
4Ch 7: Evaluation + Ch 11: CapstoneEval infrastructure, system design

Part 3 - How to Use Each Chapter

Chapter Structure

Every chapter in this section follows a consistent structure designed for interview preparation:

SectionPurposeHow to Use
The Real Interview MomentSets the stakes with a realistic scenarioRead once to understand what you are preparing for
What You Will MasterLearning objectives checklistUse as a progress tracker
Self-AssessmentHonest skill evaluationTake before and after studying
Core Content (Parts 1-3+)Deep technical material with diagramsStudy actively - draw diagrams, derive equations
Practice ProblemsGraduated difficulty with hintsAttempt before looking at hints; time yourself
Interview Cheat SheetQuick-reference tableReview before interviews
Spaced Repetition CheckpointsRetention scheduleFollow the Day 0/3/7/14/21 schedule strictly

Study Techniques That Work

60-Second Answer

For every concept, practice giving a 60-second explanation. Time yourself. Interviewers judge clarity and conciseness as much as correctness. If you cannot explain KV cache in 60 seconds, you will ramble for 5 minutes and lose the interviewer.

Active recall beats passive reading. After reading a section:

  1. Close the page
  2. Write down everything you remember on a blank sheet
  3. Reopen and check what you missed
  4. Focus your review on the gaps

Teach it to someone. Explain each concept to a friend, a rubber duck, or a voice recorder. If you stumble, you do not know it well enough.

Solve problems under time pressure. Real interviews give you 5-10 minutes per question. Practice with a timer.

Part 4 - The 2026 Interview Landscape

What Changed from 2024

Aspect20242026
Baseline expectation"Have you used an LLM?""Have you trained or fine-tuned a model?"
Architecture depth"Explain attention""Compare GQA vs MQA memory savings at 128K context"
RAG questions"What is RAG?""Design a RAG system with hybrid search, reranking, and evaluation"
EvaluationRarely askedStandard question: "How would you evaluate this?"
AgentsCutting-edge topicExpected knowledge for senior roles
SafetyNice to knowRequired - regulatory pressure (EU AI Act)
Cost awarenessOptionalRequired - "What does this cost to train/serve?"
Open-source knowledgeBonusExpected - LLaMA, Mistral, Qwen ecosystem

Common Interview Formats for LLM Roles

Interview Formats

Common Trap

Many candidates over-prepare for coding and under-prepare for system design. LLM system design rounds are where most candidates fail because they cannot reason about tradeoffs between RAG, fine-tuning, prompt engineering, and agents for a given use case.

The Questions That Separate Candidates

These cross-cutting questions appear in almost every LLM interview. If you can answer all of them confidently, you are well-prepared:

  1. "Walk me through the full LLM stack from pretraining to production." Tests breadth. Can you connect all 11 topics?

  2. "When would you fine-tune vs use RAG vs prompt engineer?" Tests judgment. The answer is always "it depends" - but you need to say on WHAT.

  3. "How would you evaluate whether your LLM feature is working?" Tests evaluation maturity. Most candidates have no answer beyond "vibes."

  4. "What are the failure modes of this system?" Tests safety and reliability thinking. Can you enumerate what goes wrong?

  5. "What would this cost to train/serve at our scale?" Tests cost awareness. Interviewers want back-of-envelope numbers, not "it depends."

Part 5 - Building Your LLM Portfolio

What Makes a Strong LLM Portfolio in 2026

Calling APIs is not a portfolio. Here is what actually impresses:

Project TypeImpact LevelExample
Fine-tuned a model on custom dataHighFine-tuned LLaMA 3 8B on legal documents, measured 23% improvement on domain QA
Built a production RAG systemHighRAG pipeline with hybrid search, reranking, and automated eval suite
Reproduced a paperVery HighImplemented DPO from scratch, reproduced key results on TL;DR summarization
Built evaluation infrastructureHighAutomated eval framework comparing 5 models across 3 domain-specific benchmarks
Open-source contributionVery HighContributed to vLLM, LangChain, or similar projects
Called an API with a promptNoneThis is not a portfolio project
Interviewer's Perspective

When I review LLM portfolios, I look for three things: (1) Did they measure something? (2) Did they make a tradeoff decision and explain why? (3) Did they encounter a real problem and solve it? A fine-tuning project that reports "the model got better" is worthless. One that reports "LoRA rank 16 with α=32\alpha = 32 on attention layers gave 12% improvement on our held-out set, while rank 64 caused overfitting after 2 epochs" - that is a hire signal.

Practice Problems

Problem 1: Study Plan Design

You have 3 weeks before an interview at an AI-native startup building a coding assistant (similar to Cursor). They told you the interview includes: LLM system design, coding (Python), and a technical deep-dive. Design your study plan.

Hint 1 - Direction

Think about what a coding assistant company cares about most. Which of the 11 topics are most relevant? Which can you skip or cover lightly?

Hint 2 - Insight

A coding assistant company cares deeply about: RAG (searching codebases), inference speed (real-time suggestions), evaluation (code correctness), and prompt engineering (structured outputs). They care less about pretraining from scratch or RLHF theory.

Hint 3 - Full Solution + Rubric

Optimal 3-week plan:

Week 1: Transformer Internals (2 days, focus on KV cache and inference) + Fine-Tuning (2 days, focus on LoRA and when to fine-tune) + RAG Systems (1 day, start the chapter)

Week 2: RAG Systems (3 days, deep focus - this is their core product) + Prompt Engineering (1 day, structured outputs and code prompting) + Inference Optimization (1 day, speculative decoding and batching)

Week 3: Agent Architectures (1 day - coding assistants are agents) + Evaluation (1 day - code eval is specific) + Capstone Questions (2 days) + Mock Interviews (1 day)

Scoring Rubric:

CriterionStrong HireLean HireNo Hire
Prioritized RAG and inferenceCorrectly identified as top prioritiesMentioned but did not prioritizeFocused on pretraining or RLHF
Included evaluationSpecific to code quality metricsGeneric "test it"Not mentioned
Realistic time allocationMatches 3-week constraintSlightly overloadedTried to cover everything equally
Included practice/mocksDedicated time for timed practiceMentioned brieflyAll reading, no practice

Problem 2: Role Classification

For each scenario, identify the most likely interview focus areas (top 3 chapters):

  • (a) Anthropic - Research Engineer
  • (b) Databricks - ML Engineer on Model Serving
  • (c) Harvey (legal AI) - Applied AI Engineer
  • (d) A bank - Senior ML Engineer for internal tools
Hint 1 - Direction

Think about what each company builds and what problems they solve at their core. Map those problems to our 11 chapters.

Hint 2 - Insight

Anthropic builds frontier models and studies alignment. Databricks serves models at scale. Harvey applies LLMs to legal workflows. A bank needs reliable, safe, cost-effective internal tools.

Hint 3 - Full Solution + Rubric

(a) Anthropic - Research Engineer:

  1. Ch 1: Transformer Internals (derivation-level)
  2. Ch 4: RLHF and Alignment (core mission)
  3. Ch 2: Pretraining (scaling laws, training dynamics)

(b) Databricks - ML Engineer on Model Serving:

  1. Ch 8: Inference Optimization (core job)
  2. Ch 1: Transformer Internals (memory math)
  3. Ch 2: Pretraining (training infrastructure, parallelism)

(c) Harvey - Applied AI Engineer:

  1. Ch 5: RAG Systems (legal document retrieval)
  2. Ch 3: Fine-Tuning (domain adaptation)
  3. Ch 7: Evaluation (legal accuracy measurement)

(d) Bank - Senior ML Engineer:

  1. Ch 5: RAG Systems (internal document search)
  2. Ch 10: Safety and Guardrails (regulatory compliance)
  3. Ch 6: Prompt Engineering (reliable outputs)

Scoring Rubric:

CriterionStrong HireLean HireNo Hire
Matched company to correct chapters4/4 correct or close2-3/4 correctGeneric answers for all
Justified choicesExplained reasoning tied to company missionGave answers without reasoningCould not connect topics to roles
Recognized company-specific needsMentioned specific products/challengesGeneric role mappingNo company awareness

Problem 3: Evaluate a Candidate

You are the interviewer. A candidate for an LLM Engineer role gives this answer to "Explain how LoRA works":

"LoRA is a technique where you freeze the base model and add small trainable matrices. It reduces the number of parameters you need to train. You can use it with QLoRA which also quantizes the model. It is more efficient than full fine-tuning."

Rate this answer. What is missing? What would make it a Strong Hire answer?

Hint 1 - Direction

The answer is factually correct but shallow. What specific technical details would an interviewer expect?

Hint 2 - Insight

A strong answer would include: the low-rank decomposition math, rank and alpha parameters, which modules to target, memory savings calculation, and when NOT to use LoRA.

Hint 3 - Full Solution + Rubric

Assessment: Lean No-Hire. The answer is correct but could come from reading a blog post summary. It demonstrates recognition, not understanding.

What is missing:

  1. Math: LoRA decomposes weight update ΔW\Delta W into BABA where BRd×rB \in \mathbb{R}^{d \times r} and ARr×dA \in \mathbb{R}^{r \times d}, with rank rdr \ll d
  2. Parameters: No mention of rank rr, scaling factor α\alpha, or the relationship αr\frac{\alpha}{r}
  3. Target modules: Which layers get LoRA adapters (typically Q, K, V projections; sometimes all linear layers)
  4. Memory math: For a 7B model, full fine-tuning needs ~56 GB (fp32 optimizer states), LoRA rank 16 trains ~20M params (~80 MB)
  5. Tradeoffs: When LoRA is insufficient (significant domain shift), when to increase rank
  6. Merging: LoRA weights can be merged back into base model at inference time with zero overhead

Strong Hire answer would cover all 6 points in about 2 minutes, with specific numbers.

CriterionStrong HireLean HireNo Hire
Includes mathWrites decomposition, explains rankMentions "low-rank" without mathNo math at all
Concrete numbersMemory savings, parameter countsVague "more efficient"No numbers
TradeoffsWhen to use, when not toOnly benefits"Always use LoRA"
Implementation detailsTarget modules, alpha/rank tuningGeneric descriptionSounds like API docs

Interview Cheat Sheet

TopicCore Question60-Second Answer Must Include
Transformer Internals"How does attention work in modern LLMs?"Scaled dot-product, causal masking, GQA, RoPE, KV cache
Pretraining"How are LLMs trained?"Data pipeline, causal LM objective, scaling laws, 3D parallelism
Fine-Tuning"When and how do you fine-tune?"LoRA math, rank/alpha, full vs parameter-efficient, cost comparison
RLHF"How do you align an LLM?"SFT then RM then PPO (or DPO), preference data, reward hacking risks
RAG"How do you add knowledge to an LLM?"Chunk, embed, retrieve, rerank, generate, evaluate faithfulness
Prompt Engineering"How do you optimize prompts?"CoT, few-shot, structured output, systematic testing, version control
Evaluation"How do you know your LLM works?"Task-specific metrics, human eval, LLM-as-judge, benchmark contamination
Inference"How do you serve LLMs efficiently?"KV cache, continuous batching, quantization, speculative decoding
Agents"How do you build LLM agents?"ReAct loop, tool use, planning, memory, error recovery, evaluation
Safety"How do you make LLMs safe?"Input/output filtering, prompt injection defense, red teaming, monitoring

Spaced Repetition Checkpoints

Use this schedule to retain what you learn. Each checkpoint should take 15-20 minutes.

Day 0 (After reading this overview)

  • Draw the topic dependency diagram from memory
  • Write down the 5 competencies interviewers probe
  • Identify your study path and target timeline
  • Complete the self-assessment table honestly

Day 3

  • Without looking, list all 11 topics in order
  • For each topic, write one sentence about what it covers
  • Recite the 5 cross-cutting questions that separate candidates
  • Review your study plan - are you on track?

Day 7

  • Explain to someone (or a recorder) why LLM interviews are different in 2026
  • For your target company tier, list the top 5 topics to prioritize
  • Quiz yourself: for each of the 10 cheat sheet topics, give a 60-second answer
  • Adjust your study plan based on which topics felt weakest

Day 14

  • Redo the self-assessment. Compare scores to Day 0
  • Do a mock interview: have someone ask you 5 random cheat sheet questions
  • Time yourself: can you explain each topic in under 60 seconds?
  • Identify your top 3 weak areas and schedule extra review

Day 21

  • Final self-assessment. All scores should be 4+
  • Full mock interview simulation (30 min, mixed topics)
  • Review the practice problems - can you solve them without hints?
  • Prepare your "LLM story" - the 2-minute narrative of your LLM experience

What Comes Next

Start with Chapter 1: Transformer Internals for LLMs. This is the foundation everything else builds on. Even if you have studied Transformers before, the LLM-specific details (GQA, RoPE, SwiGLU, KV cache math) are what interviewers test in 2026.

If you scored 4+ on Transformer Internals in your self-assessment, you can move quickly through Chapter 1 and spend more time on your weaker areas. But do not skip it - the practice problems will reveal gaps you did not know you had.

© 2026 EngineersOfAI. All rights reserved.