Skip to main content

AI Engineer Problem List

Reading time: ~40 min | Interview relevance: Critical | Roles: AI Engineer, LLM Engineer, GenAI Engineer, Applied AI Engineer

The AI Engineer role barely existed two years ago. Now it is one of the hottest positions in tech, with companies scrambling to build LLM-powered products. But because the role is new, interview formats vary wildly. One company might ask you to build a RAG pipeline live, another might quiz you on prompt engineering strategies, and a third might focus on traditional coding with an LLM twist.

This list of 45 problems covers the full spectrum of what AI Engineer candidates face. It emphasizes the unique skills that define this role: working with large language models, building retrieval-augmented systems, designing AI agents, and shipping production AI applications.

AI Engineer Interview Structure

RoundDurationWhat They TestWeight
Coding45-60 minDSA + API design + LLM integration20-25%
LLM / AI Depth45-60 minPrompt engineering, RAG, fine-tuning, evaluation25-30%
System Design45-60 minProduction AI systems, architecture25-30%
Take-Home / Live Build2-4 hoursBuild an AI feature end-to-end15-20%
Behavioral30-45 minProduct sense, collaboration, shipping velocity10%

:::tip The AI Engineer Differentiator AI Engineers are builders, not researchers. Interviewers care about: Can you ship an AI-powered feature? Can you make it reliable? Can you iterate quickly? Deep ML theory is less important than practical system-building skills. :::

Round 1: Coding & API Problems (12 Problems)

AI Engineer coding rounds emphasize API design, data transformation, and LLM integration over pure algorithms.

Core Coding

#ProblemDifficultyTimeKey PatternWhy AI Engineers Need ItCompany Tags
1Design a REST API for a Chat ApplicationMedium25 minRESTful design, WebSocket for streamingAI apps need clean APIs for model interactionOpenAI, Anthropic, Startups
2Implement a Token Counter and Text ChunkerMedium20 minTokenization, sliding window chunkingCore building block of any RAG systemAI Labs, Big Tech
3Build a Retry Handler with Exponential BackoffEasy15 minError handling, backoff strategyLLM APIs fail; robust error handling is essentialAll
4Implement a Concurrent API Call ManagerMedium25 minAsync/await, rate limiting, batchingParallel LLM calls for throughputStartups, Big Tech
5Parse and Validate Structured Output from an LLMMedium20 minJSON parsing, schema validation, error recoveryLLMs return messy outputs; parsing is criticalAll

LLM-Flavored Coding

#ProblemDifficultyTimeKey PatternWhy AI Engineers Need ItCompany Tags
6Implement a Prompt Template EngineMedium25 minString interpolation, variable injection, escapingPrompts are code; they need proper templatingStartups, AI Labs
7Build a Simple Vector Search with Cosine SimilarityEasy20 minEmbedding comparison, top-K retrievalFoundation of semantic search and RAGAll
8Implement a Conversation Memory ManagerMedium25 minSliding window, summarization triggers, token budgetingLong conversations exceed context windowsAnthropic, OpenAI, Startups
9Build a Function-Calling RouterMedium30 minIntent classification, argument extraction, dispatchAgentic systems need function routingOpenAI, Anthropic, Cohere
10Implement a Response Streaming HandlerMedium20 minServer-sent events, token-by-token processingStreaming is standard for LLM UXAll
11Build a Simple Evaluation HarnessMedium25 minTest case management, metric computation, result aggregationEvaluation drives AI product qualityAI Labs, Startups
12Implement Document Deduplication with MinHashHard30 minLocality-sensitive hashing, Jaccard similarityData quality for RAG knowledge basesGoogle, Databricks

Round 2: LLM & AI Depth Problems (18 Problems)

These problems test your understanding of how LLMs work, how to use them effectively, and how to build reliable AI systems.

Prompt Engineering & Design

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
13Design a Prompt Chain for Complex Document SummarizationMedium25 minMulti-step prompting, context managementComplex tasks require decompositionAll
14Implement Few-Shot Classification with Dynamic Example SelectionMedium25 minEmbedding similarity for example retrievalStatic examples underperform dynamic selectionAI Labs, Startups
15Design a Prompt for Structured Data Extraction from Unstructured TextMedium20 minOutput formatting, schema enforcementInformation extraction is a top AI use caseAll
16Build a Self-Correcting Prompt PipelineHard30 minOutput validation, error feedback, retry with correctionsLLMs make mistakes; self-correction improves reliabilityAnthropic, OpenAI
17Compare and Evaluate Prompt Strategies for a Classification TaskMedium25 minA/B testing prompts, metric comparisonSystematic prompt optimization beats guessingAll

RAG Systems

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
18Design a RAG Pipeline for a Technical Documentation SystemMedium35 minChunking, embedding, retrieval, generationThe canonical AI Engineer system design problemAll
19Implement Hybrid Search (Keyword + Semantic)Medium25 minBM25 + embedding search, reciprocal rank fusionSemantic search alone misses keyword matchesGoogle, Startups
20Design a Multi-Document QA System with Source AttributionHard35 minCross-document retrieval, citation generationUsers need to verify AI answersAnthropic, Google, Startups
21Handle Stale and Conflicting Information in a RAG SystemHard30 minTemporal filtering, conflict resolution, freshness scoringReal knowledge bases have contradictionsAll
22Evaluate RAG Quality: Design Metrics and Test SuiteMedium25 minFaithfulness, relevance, completeness metricsYou cannot improve what you cannot measureAI Labs, Big Tech

AI Agents & Tools

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
23Design an AI Agent with Tool Use CapabilitiesHard35 minPlanning, tool selection, execution, error recoveryAgents are the next frontier of AI applicationsAnthropic, OpenAI, Startups
24Implement a ReAct-Style Reasoning AgentMedium30 minThought-action-observation loopMost popular agent architectureAI Labs, Startups
25Design a Multi-Agent Collaboration SystemHard35 minAgent roles, communication protocol, consensusComplex tasks benefit from specialized agentsAI Labs, Startups
26Build an Agent with Memory and LearningHard30 minShort-term context, long-term storage, retrievalPersistent agents need memory managementAnthropic, OpenAI

Fine-Tuning & Model Customization

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
27Design a Fine-Tuning Pipeline for Domain AdaptationMedium30 minData preparation, training config, evaluationWhen prompting is not enough, fine-tuning is nextAI Labs, Big Tech
28Compare Fine-Tuning vs. RAG vs. Prompt Engineering for a Use CaseMedium25 minTradeoff analysis: cost, quality, latency, maintenanceThe most common AI architecture decisionAll
29Implement LoRA Fine-Tuning Configuration and Explain the ApproachMedium25 minParameter-efficient fine-tuningLoRA is the standard for efficient fine-tuningAI Labs, Startups
30Design a Training Data Curation Pipeline for Fine-TuningMedium25 minData quality, diversity, deduplication, formattingGarbage in, garbage out -- especially for fine-tuningAll

:::warning The RAG vs. Fine-Tuning Question This comes up in nearly every AI Engineer interview. Have a clear, nuanced framework:

  • RAG when: knowledge changes frequently, need source attribution, want to avoid training
  • Fine-tuning when: need consistent style/format, domain-specific reasoning, latency matters
  • Both when: domain knowledge + retrieval needed (fine-tuned model with RAG) :::

Round 3: System Design Problems (15 Problems)

AI Engineer system design focuses on production AI applications -- how to make them reliable, fast, and cost-effective.

Production AI Systems

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
31Design a Customer Support Chatbot SystemMedium40 minIntent routing, knowledge retrieval, escalationThe most common AI Engineer projectAll
32Design a Code Review AssistantMedium35 minCode parsing, context retrieval, LLM analysisDeveloper tools are a hot AI application areaGitHub, Anthropic, Google
33Design a Content Generation PipelineMedium35 minTemplate system, LLM generation, quality checks, human reviewContent generation at scale needs guardrailsJasper, Copy.ai, Big Tech
34Design an AI-Powered Search EngineHard45 minQuery understanding, hybrid retrieval, LLM-augmented resultsNext-gen search combines retrieval and generationGoogle, Perplexity, You.com
35Design a Document Processing and Analysis SystemMedium35 minOCR, parsing, extraction, classification, summarizationEnterprise AI = document processingAmazon, Google, Startups

Reliability & Safety

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
36Design a Guardrails System for LLM OutputHard35 minInput validation, output filtering, safety classificationSafety is non-negotiable for production AIAnthropic, OpenAI, All
37Design an LLM Evaluation and Monitoring PlatformHard40 minAutomated eval, human eval, drift detection, alertingProduction AI needs continuous evaluationAI Labs, Big Tech
38Design a Cost Optimization Strategy for LLM ApplicationsMedium30 minCaching, model routing, prompt optimization, batchingLLM API costs can spiral quicklyStartups, All
39Design a Fallback and Degradation Strategy for AI FeaturesMedium25 minGraceful degradation, fallback models, cached responsesAI systems fail; plan for itAll
40Design a Prompt Versioning and Deployment SystemMedium25 minVersion control, A/B testing, rollbackPrompts are code and need software engineering practicesAI Labs, Startups

Scaling & Infrastructure

#ProblemDifficultyTimeKey PatternWhy It MattersCompany Tags
41Design a Multi-Tenant AI PlatformHard40 minIsolation, resource allocation, custom models per tenantB2B AI products serve many customersStartups, Big Tech
42Design a Semantic Caching Layer for LLM ResponsesMedium30 minEmbedding-based cache keys, similarity threshold, invalidationReduce cost and latency for similar queriesStartups, AI Labs
43Design an AI-Powered Data PipelineMedium30 minLLM for transformation, validation, error handlingAI augments data engineering workflowsDatabricks, Startups
44Design a Real-Time AI Translation SystemHard40 minStreaming translation, context preservation, quality monitoringReal-time AI needs careful latency managementGoogle, Meta, Startups
45Design an AI Feature Flag SystemMedium25 minFeature flags for AI capabilities, gradual rollout, metricsShip AI features safely with controlled exposureAll

4-Week AI Engineer Study Plan

WeekFocusProblemsDaily Load
Week 1Coding + API design#1-122 problems/day
Week 2LLM depth + RAG#13-222 problems/day
Week 3Agents + System Design#23-351-2 per day
Week 4Reliability + Polish#36-45 + review1 problem + 1 mock/day

Daily Practice Format

AI Engineer Daily Practice Format - Morning Study and Evening Build

:::tip Build, Don't Just Study AI Engineer interviews often include live builds or take-home projects. Set up a development environment with an LLM API and actually build small projects as you study. Reading about RAG is not the same as building one. :::

Key Frameworks for AI Engineer Interviews

RAG Architecture Decision Framework

DimensionSimple RAGAdvanced RAGFine-Tuned + RAG
Setup costLowMediumHigh
MaintenanceLowMediumHigh
QualityGood for simple queriesBetter for complex queriesBest for domain-specific
LatencyMedium (retrieval + generation)Higher (multi-step)Lower (no retrieval for common patterns)
When to useMVP, proof of conceptProduction systemsHigh-stakes, domain-specific

LLM Selection Framework

FactorSmall Models (7-13B)Medium Models (30-70B)Large Models (100B+)API Models (GPT-4, Claude)
LatencyVery lowLowMediumVariable
CostLow (self-hosted)MediumHighPer-token
QualityGood for narrow tasksGood for most tasksGreat for complex tasksState-of-the-art
PrivacyFull controlFull controlFull controlData sent to provider
Best forHigh-volume, simpleBalancedResearch, complexQuick iteration, quality

Evaluation Framework for AI Applications

MetricWhat It MeasuresHow to Compute
FaithfulnessDoes the output match retrieved context?LLM-as-judge or NLI model
RelevanceIs the output useful for the user's query?Human eval or LLM-as-judge
CompletenessDoes the output cover all relevant points?Checklist comparison
HarmlessnessIs the output safe and appropriate?Safety classifier + human review
LatencyHow fast is the response?P50, P95, P99 timing
CostHow much does each query cost?Token counting + API pricing

AI Engineer vs. MLE: Problem Differences

DimensionAI Engineer FocusMLE Focus
CodingAPI design, integrations, parsingAlgorithm implementation, optimization
ModelsUsing LLMs effectively, prompt engineeringTraining models, architecture design
DataRetrieval, chunking, embeddingFeature engineering, data pipelines
SystemsAI application architectureML infrastructure, training systems
EvaluationOutput quality, safety, user satisfactionModel metrics (AUC, F1, NDCG)
DeploymentAPI serving, caching, cost managementModel optimization, A/B testing

Progress Tracker

#ProblemStatusDateTimeNotes
1Chat API Design[ ]
2Token Counter & Chunker[ ]
3Retry with Backoff[ ]
4Concurrent API Manager[ ]
5Structured Output Parser[ ]
6Prompt Template Engine[ ]
7Vector Search[ ]
8Conversation Memory[ ]
9Function-Calling Router[ ]
10Response Streaming[ ]
11Evaluation Harness[ ]
12Document Deduplication[ ]
13Prompt Chain Summarization[ ]
14Dynamic Few-Shot[ ]
15Data Extraction Prompt[ ]
16Self-Correcting Pipeline[ ]
17Prompt Strategy Evaluation[ ]
18RAG Pipeline Design[ ]
19Hybrid Search[ ]
20Multi-Doc QA with Citations[ ]
21Stale Info in RAG[ ]
22RAG Quality Metrics[ ]
23Agent with Tools[ ]
24ReAct Agent[ ]
25Multi-Agent System[ ]
26Agent Memory[ ]
27Fine-Tuning Pipeline[ ]
28FT vs RAG vs Prompting[ ]
29LoRA Configuration[ ]
30Training Data Curation[ ]
31Support Chatbot[ ]
32Code Review Assistant[ ]
33Content Generation Pipeline[ ]
34AI Search Engine[ ]
35Document Processing[ ]
36LLM Guardrails[ ]
37Eval & Monitoring Platform[ ]
38Cost Optimization[ ]
39Fallback Strategy[ ]
40Prompt Versioning[ ]
41Multi-Tenant AI Platform[ ]
42Semantic Caching[ ]
43AI Data Pipeline[ ]
44Real-Time Translation[ ]
45AI Feature Flags[ ]

Next Steps

After completing the AI Engineer problem list:

© 2026 EngineersOfAI. All rights reserved.