Agent Architectures

Agentic AI is the fastest-growing area in production LLM systems. Interviewers want to know that you understand not just the patterns (ReAct, plan-and-execute) but the engineering realities - how to make agents reliable, debuggable, and safe. This page covers the full spectrum from single-agent tool use to multi-agent orchestration.

Why Interviewers Care

Interviewer's Perspective

"Everyone can build a demo agent. I want to see whether you understand why agents fail in production, how to evaluate them, and how to architect systems that degrade gracefully. The difference between a demo and a product is reliability - and that is what I am hiring for."

1. Foundations: What Makes an Agent

An agent is an LLM-powered system that can:

Observe its environment (read inputs, tool outputs, memory).
Reason about what to do next.
Act on the environment (call tools, write outputs).
Loop until the task is complete or a stop condition is met.

Agent Loop

Agent vs. Chain vs. RAG

Feature	Chain	RAG	Agent
Fixed steps	Yes	Yes	No (dynamic)
Tool use	No	Retriever only	Multiple tools
Loops	No	No	Yes
Planning	No	No	Yes
Failure modes	Predictable	Predictable	Complex
Latency	Low	Medium	High (variable)

60-Second Answer

"An agent is an LLM system with a dynamic control flow - it decides at each step whether to call a tool, reason further, or return a final answer. Unlike chains (fixed sequence) or RAG (retrieve-then-generate), agents can loop, backtrack, and adapt their strategy. The core loop is observe-reason-act-repeat."

2. ReAct: Reasoning + Acting

The ReAct Pattern

ReAct (Yao et al., 2022) interleaves reasoning traces with actions in a single prompt:

Thought: I need to find the current stock price of NVIDIA.
Action: search("NVIDIA stock price today")
Observation: NVIDIA (NVDA) is trading at $892.43 as of March 6, 2026.
Thought: Now I need to calculate the market cap. I know they have about 24.5B shares outstanding.
Action: calculator("892.43 * 24500000000")
Observation: 21,864,535,000,000
Thought: The market cap is approximately $21.86 trillion. I have enough information to answer.
Final Answer: NVIDIA's current market cap is approximately $21.86 trillion.

Why ReAct Works

Reasoning traces help the LLM decompose complex problems step by step.
Actions ground the reasoning in real data, reducing hallucination.
Observations provide feedback that guides subsequent reasoning.

ReAct vs. Chain-of-Thought vs. Act-Only

Approach	Reasoning	Acting	Accuracy	Interpretability
Act-only	No explicit	Yes	Low	Low
CoT-only	Yes	No	Medium (hallucinates facts)	High
ReAct	Yes	Yes	High	High

Common Trap

Candidates often conflate ReAct with "just prompting the LLM to use tools." ReAct's key insight is that explicit reasoning traces before each action dramatically improve tool selection accuracy and multi-step planning. Without the "Thought" step, agents make more errors in tool selection and argument construction.

3. Function Calling and Tool Use

Function Calling Mechanism

Modern LLMs support native function calling where the model outputs a structured tool invocation rather than free-text:

Function Calling

Tool Definition Best Practices

{
  "name": "search_database",
  "description": "Search the product database. Use this when the user asks about product availability, pricing, or specifications. Do NOT use for general knowledge questions.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Natural language search query"
      },
      "category": {
        "type": "string",
        "enum": ["electronics", "clothing", "food", "all"],
        "description": "Product category to filter by. Use 'all' if unsure."
      },
      "max_results": {
        "type": "integer",
        "description": "Maximum results to return (1-20)",
        "default": 5
      }
    },
    "required": ["query"]
  }
}

Key principles:

Descriptive names: search_database not tool_1.
When-to-use guidance: Tell the model when to use and when NOT to use the tool.
Parameter descriptions: Each parameter should explain what it does and valid values.
Sensible defaults: Reduce the decisions the model needs to make.
Constrained types: Use enums, ranges, and required fields to prevent invalid calls.

Parallel Tool Calling

Many APIs support calling multiple tools simultaneously:

User: Compare the weather in Tokyo and London.
Assistant: [
  {"tool": "get_weather", "args": {"city": "Tokyo"}},
  {"tool": "get_weather", "args": {"city": "London"}}
]

This reduces latency by executing independent tool calls concurrently.

60-Second Answer

"Function calling lets the LLM output structured tool invocations instead of free-text. The key to reliable function calling is well-designed tool definitions \text{---} clear names, descriptions that specify when to use the tool, constrained parameter types, and sensible defaults. Parallel tool calling further reduces latency by executing independent calls concurrently."

4. Model Context Protocol (MCP)

What Is MCP

MCP (Anthropic, 2024) is an open standard for connecting LLMs to external tools and data sources. It defines a client-server protocol where:

MCP Servers expose tools, resources, and prompts via a standardized interface.
MCP Clients (LLM applications) discover and invoke these capabilities.
Transport: JSON-RPC over stdio or HTTP+SSE.

MCP Architecture

MCP Capabilities

Capability	Description	Example
Tools	Functions the LLM can call	`query_database`, `send_email`
Resources	Data the LLM can read	File contents, database schemas
Prompts	Pre-built prompt templates	"Summarize this codebase"
Sampling	Server-initiated LLM calls	Server asks LLM to analyze a result

Why MCP Matters

Standardization: One protocol to connect any LLM to any tool.
Composability: Mix and match servers (database + file system + API).
Security: Servers control access; clients request capabilities.
Ecosystem: Growing library of pre-built MCP servers.

Company Variation

Anthropic expects deep MCP knowledge \text{---} it is their protocol. OpenAI may ask you to compare MCP with their plugin/function calling approach. Startups care about practical integration \text{---} how to build and deploy MCP servers. Google may ask about comparison with their Genkit tooling.

5. Multi-Agent Systems

Why Multiple Agents

Single agents struggle with:

Complex tasks requiring diverse expertise (research + coding + review).
Long-running tasks where context windows overflow.
Reliability \text{---} a single point of failure.

Multi-agent systems decompose work across specialized agents that communicate and coordinate.

Architectures

Multi-Agent Architectures

Architecture	When to Use	Strengths	Weaknesses
Hierarchical	Complex multi-domain tasks	Clear accountability, scalable	Single point of failure at manager
Peer-to-peer	Debate, brainstorming	Diverse perspectives	Coordination overhead, circular discussions
Pipeline	Sequential refinement	Predictable flow	No parallelism, long latency
Dynamic	Unpredictable tasks	Flexible	Hard to debug

Framework Comparison

Framework	Architecture	Key Feature	Best For
AutoGen	Flexible (conversation-based)	Agent-to-agent chat	Research, prototyping
CrewAI	Role-based hierarchical	Role + goal + backstory per agent	Business workflows
LangGraph	Graph-based state machine	Explicit state and transitions	Production systems
OpenAI Swarm	Handoff-based	Lightweight agent transfers	Customer service routing
Anthropic MCP	Server-based tool providers	Standardized protocol	Tool integration

LangGraph Deep Dive

LangGraph models agents as state machines with explicit nodes and edges:

from langgraph.graph import StateGraph, END

# Define state
class AgentState(TypedDict):
    messages: list
    plan: str
    results: list

# Define nodes (each is a function)
def planner(state):
    # LLM call to create a plan
    ...

def executor(state):
    # Execute the plan step by step
    ...

def reviewer(state):
    # Check quality, decide if done
    ...

# Build graph
graph = StateGraph(AgentState)
graph.add_node("plan", planner)
graph.add_node("execute", executor)
graph.add_node("review", reviewer)

graph.add_edge("plan", "execute")
graph.add_edge("execute", "review")
graph.add_conditional_edges(
    "review",
    should_continue,  # function that returns "plan" or END
    {"plan": "plan", "end": END}
)

Why LangGraph for production:

Persistence: State can be saved and resumed (human-in-the-loop).
Streaming: Token-level streaming from any node.
Debugging: Full state history and visualization.
Checkpointing: Roll back to any previous state.

60-Second Answer

"For production multi-agent systems, I favor LangGraph because it models agents as explicit state machines with typed state, conditional edges, and built-in persistence. AutoGen is great for research prototyping with its conversation-based approach. CrewAI excels at business workflows with its role-based abstraction. The key architectural decision is whether you need a hierarchical controller, peer-to-peer communication, or a sequential pipeline."

6. Planning Strategies

Task Decomposition

Break complex tasks into manageable sub-tasks:

Task Decomposition

Plan-and-Execute

Separate the planning step from execution:

Planner: Generate a full plan before taking any action.
Executor: Execute each step, feeding results back.
Replanner: After each step, optionally revise the remaining plan based on new information.

Advantages over pure ReAct:

More coherent multi-step strategies.
The plan provides a progress tracker.
Easier to add human approval at the plan stage.

Disadvantages:

Initial plan may be wrong \text{---} requires replanning.
Slower to start (must plan before acting).
Planning LLM call may be expensive.

Tree-of-Thought (ToT)

Explore multiple reasoning paths simultaneously:

Tree-of-Thought for Agents

Key components:

Thought generator: Propose multiple candidate next steps.
Evaluator: Score each candidate (LLM-as-judge or heuristic).
Search strategy: BFS (explore breadth) or DFS (explore depth).
Pruning: Discard low-scoring branches early.

When to use ToT:

Problems with multiple valid approaches (math, coding, planning).
When the cost of exploring is less than the cost of backtracking.
NOT for simple, well-defined tasks (overkill).

Reflection and Self-Critique

After each action or plan step, the agent reflects on its work:

Action: Wrote function to parse CSV file.
Reflection: The function doesn't handle quoted commas or
encoding issues. It also lacks error handling for missing
files. I should revise it before moving on.
Revised Action: Rewrote function with proper CSV parsing,
UTF-8 handling, and try/except blocks.

Reflexion (Shinn et al., 2023) formalizes this as a loop:

Act on the task.
Evaluate the result (test, LLM judge, or environment feedback).
If failed, generate a verbal reflection on what went wrong.
Store the reflection in memory and retry with that context.

7. Memory Systems

Memory Types

Agent Memory Types

Memory Type	Storage	Retrieval	Persistence	Use Case
Short-term	Context window	Implicit (in prompt)	Session only	Current conversation
Long-term (semantic)	Vector DB	Similarity search	Persistent	Knowledge base, past conversations
Long-term (structured)	Graph DB / SQL	Query	Persistent	User profiles, relationships
Episodic	Vector DB + metadata	Similarity + filter	Persistent	Learning from past tasks

Short-Term Memory Management

The context window is finite. Strategies for managing it:

Sliding window: Keep only the last $N$ messages. Simple but loses early context.
Summarization: Periodically summarize older messages into a compact summary.
Token counting: Track token usage and compress when approaching the limit.
Importance scoring: Keep high-importance messages (tool results, key decisions) and drop filler.

Long-Term Memory with Vector Stores

# Store a memory
memory_text = "User prefers Python over JavaScript for backend tasks."
embedding = embed(memory_text)
vector_store.upsert(id="mem_001", vector=embedding,
                     metadata={"type": "preference", "user": "alice"})

# Retrieve relevant memories
query = "What language should I use for the API?"
query_embedding = embed(query)
results = vector_store.search(query_embedding, top_k=5,
                               filter={"user": "alice"})

Episodic Memory

Store traces of past agent executions to learn from experience:

{
  "task": "Deploy ML model to production",
  "outcome": "success",
  "steps_taken": 12,
  "key_decisions": [
    "Used Docker instead of bare metal - faster iteration",
    "Added health check endpoint - caught OOM early"
  ],
  "mistakes": [
    "Initially forgot to set resource limits - caused OOM"
  ],
  "lessons": [
    "Always set CPU/memory limits in container configs"
  ]
}

When facing a similar task, the agent retrieves relevant episodes and incorporates lessons learned into its planning.

Common Trap

Candidates often describe memory as "just a vector store." Interviewers want to hear about the write side (what to store, when to update, how to handle contradictions) as much as the read side (retrieval). Memory management - deciding what to remember and what to forget - is the hard engineering problem.

8. Agent Evaluation and Debugging

Why Agent Evaluation Is Hard

Non-deterministic: Same input can produce different action sequences.
Multi-step: Errors compound over multiple steps.
Tool-dependent: Tool failures are not the agent's fault but affect outcomes.
Subjective: "Good" agent behavior is often domain-specific.

Evaluation Dimensions

Dimension	Metric	How to Measure
Task completion	Success rate	Automated tests, human evaluation
Efficiency	Steps taken, tokens used, latency	Instrumentation
Tool accuracy	Correct tool selection, valid arguments	Log analysis
Reasoning quality	Coherence of thought chains	LLM-as-judge
Safety	No harmful actions, respects permissions	Red teaming
Cost	Total API cost per task	Token tracking

Evaluation Frameworks

Agent Evaluation Frameworks

Trajectory evaluation (not just outcome):

Did the agent take reasonable steps?
Did it recover from errors?
Did it avoid unnecessary tool calls?
Did it use the right tools for the right reasons?

Debugging Techniques

Trace logging: Log every LLM call, tool call, and state transition.
State snapshots: Save the full state at each step for replay.
Step-through execution: Pause after each step for human inspection.
Counterfactual analysis: "What if the agent had chosen a different tool at step 3?"
Failure categorization: Classify failures as planning errors, tool errors, or reasoning errors.

Common Agent Failure Modes

Failure	Symptom	Fix
Infinite loop	Agent repeats the same action	Max iteration limit, loop detection
Wrong tool selection	Agent uses search when it should use calculator	Better tool descriptions, few-shot examples
Argument hallucination	Agent makes up tool arguments	Constrained schemas, validation
Context overflow	Agent loses track of earlier steps	Summarization, scratchpad
Premature termination	Agent stops before completing the task	Completion criteria in prompt
Over-planning	Agent plans extensively but never acts	Planning budget, force action after N thoughts

Interviewer's Perspective

"I care more about how you debug agents than how you build them. Walk me through how you would diagnose why an agent that works 90% of the time fails on the other 10%. Show me your systematic approach to failure analysis."

9. Safety: Sandboxing and Permission Systems

Why Agent Safety Matters

Agents can:

Execute arbitrary code.
Access databases and file systems.
Call external APIs with side effects.
Spend money (API calls, purchases).
Leak sensitive information.

Defense in Depth

Defense in Depth for Agents

Permission Systems

Principle of least privilege: Agents should only have access to the tools and data they need for the current task.

Permission Level	Description	Example
Read-only	Can read data, no side effects	Search, lookup
Write with approval	Can propose writes, human approves	File edits, emails
Write with limits	Can write within constraints	Under 100 lines, approved directories
Full access	Unrestricted (dangerous)	Admin tasks, emergency response

Sandboxing Strategies

Container isolation: Run code in Docker containers with resource limits.
VM isolation: Separate virtual machines for untrusted code (E2B, Modal).
API allowlisting: Only permit calls to approved API endpoints.
File system restrictions: Chroot or namespace isolation.
Network restrictions: Block outbound network access by default.
Time and resource limits: Kill processes that exceed CPU, memory, or time budgets.

Human-in-the-Loop Patterns

Pattern	When to Use	Trade-off
Approve every action	High-stakes, early deployment	Safe but slow
Approve high-risk actions only	Medium-stakes, established tools	Balanced
Notify and proceed	Low-stakes, trusted agent	Fast but less control
Full autonomy with audit	Well-tested, low-risk tasks	Fastest but requires monitoring

Instant Rejection

"We can just let the agent run freely and review the outputs later." This shows a fundamental misunderstanding of agent safety. Agents with write access can cause irreversible damage - deleting data, sending emails, or leaking secrets. Defense in depth is not optional.

Practice Problems

Problem 1: Design an Agent Architecture

Design an agent that can help a data analyst write SQL queries, execute them against a database, visualize results, and iterate based on feedback. What tools, memory, and safety mechanisms would you include?

Hint 1 - Direction

Think about the tools needed (SQL executor, chart generator), memory (conversation + past queries), and safety (read-only DB access, query validation).

Hint 2 - Insight

Consider a plan-and-execute architecture with a reviewer step. The SQL executor should be sandboxed (read-only, query timeout, result size limits). Memory should include the schema and past successful queries.

Full Solution + Rubric

Architecture: Plan-and-Execute with LangGraph

Tools:

get_schema(table_name) - Returns table schema (columns, types, relationships).
execute_sql(query) - Runs SQL query against read-only replica.
create_chart(data, chart_type, title) - Generates visualization.
explain_query(query) - Returns query execution plan.

Memory:

Short-term: Conversation history + current query context.
Long-term: Database schema (cached), past successful queries (vector store for semantic search).
Working memory: Current data results, intermediate analysis.

Safety:

Read-only database replica (no INSERT/UPDATE/DELETE).
Query timeout (30 seconds max).
Result size limit (10K rows max).
SQL injection prevention (parameterized queries).
PII detection on output before showing to user.

Flow:

User describes what they want to analyze.
Planner creates a multi-step SQL analysis plan.
For each step: generate SQL, explain the plan, execute, validate results.
Reviewer checks if results answer the question.
If not, replan with the new information.
Generate visualization and narrative summary.

Scoring:

Strong Hire: Complete architecture with tools, memory, safety, and error handling. Discusses read-only replicas, query optimization, and how to handle ambiguous user requests.
Lean Hire: Reasonable architecture but missing safety mechanisms or memory strategy.
No Hire: Just describes "an LLM that writes SQL" without agent architecture.

Problem 2: Multi-Agent Coordination

You need to build a system that reviews pull requests: checks code quality, runs tests, verifies documentation, and provides a summary. Design the multi-agent architecture.

Hint 1 - Direction

Consider which tasks can run in parallel (code review, test running, doc checking) and which are sequential (summary must come last).

Hint 2 - Insight

A manager agent with three specialist agents (code reviewer, test runner, doc checker) followed by a synthesizer. Use LangGraph with a fan-out/fan-in pattern.

Full Solution + Rubric

Architecture: Hierarchical with fan-out/fan-in

Manager Agent
  |
  +-- Code Review Agent (parallel)
  |     - Analyzes diff for bugs, style, security
  |     - Tools: get_diff, search_codebase, lint
  |
  +-- Test Agent (parallel)
  |     - Runs test suite, analyzes failures
  |     - Tools: run_tests, get_coverage, analyze_failure
  |
  +-- Doc Agent (parallel)
  |     - Checks if docs match code changes
  |     - Tools: get_docs, compare_api_changes
  |
  +-- Synthesizer Agent (after all complete)
        - Combines all reviews into a coherent summary
        - Assigns overall risk level
        - Provides approval recommendation

State Machine (LangGraph):

Parse PR (get diff, files changed, PR description).
Fan-out: Send to Code, Test, and Doc agents in parallel.
Fan-in: Collect all three reviews.
Synthesize: Combine into final review with risk assessment.
Human approval gate: Present summary for human sign-off before posting.

Error handling:

If a specialist agent fails, the synthesizer notes it as "not reviewed."
Timeout per agent (5 minutes max).
Retry once on failure, then skip with warning.

Scoring:

Strong Hire: Parallel execution design, error handling, human-in-the-loop, specific tools per agent, and state machine definition.
Lean Hire: Sequential design or missing error handling but reasonable agent decomposition.
No Hire: Single agent that tries to do everything, or no concrete architecture.

Problem 3: Memory System Design

Your customer support agent handles 10K conversations per day. Users often return with follow-up questions days later. Design the memory system.

Hint 1 - Direction

You need both per-conversation memory and cross-conversation user memory. Think about what to store, how to retrieve, and how to handle stale information.

Hint 2 - Insight

Layer the memory: conversation buffer (short-term), user profile (structured long-term), conversation summaries (semantic long-term), and a resolution knowledge base (episodic).

Full Solution + Rubric

Memory Architecture:

Conversation buffer (Redis, TTL 24h):
- Full message history for active conversations.
- Token count tracking for context window management.
User profile (PostgreSQL):
- Structured data: name, account type, subscription, past issues.
- Updated after each conversation.
- Retrieved at conversation start.
Conversation summaries (Vector DB):
- After each conversation ends, generate a summary.
- Store with user ID, timestamp, topic tags, resolution status.
- Retrieved when user returns: "Looks like we last spoke about X on date Y."
Resolution knowledge base (Vector DB):
- Successful resolutions stored as templates.
- When a new issue matches a past resolution, suggest it.
- Updated by support team (human-in-the-loop).
Entity memory (Knowledge graph):
- Track relationships: user owns product X, product X has known issue Y.
- Update when new information is learned.

Retrieval strategy at conversation start:

Fetch user profile (SQL lookup).
Fetch last 5 conversation summaries (vector search + user filter).
Inject into system prompt: "Returning user. Previous interactions: ..."

Memory hygiene:

Conversation summaries older than 1 year are archived.
PII is encrypted at rest.
Users can request memory deletion (GDPR).
Contradictions are resolved by recency (newer info wins).

Scoring:

Strong Hire: Layered architecture with specific storage choices, retrieval strategy, PII handling, and staleness management.
Lean Hire: Vector store + user profile but missing conversation summaries or staleness handling.
No Hire: "Just use a vector store for everything."

Interview Cheat Sheet

Topic	Key Fact	Typical Question
ReAct	Interleave reasoning traces with actions	"What is ReAct and why does it work?"
Function Calling	Structured tool invocations; well-defined schemas	"How do you design reliable tool definitions?"
MCP	Anthropic's open standard; JSON-RPC; tools + resources + prompts	"What is MCP and why does it matter?"
Multi-Agent	Hierarchical, peer, pipeline architectures	"When would you use multiple agents?"
LangGraph	Graph-based state machine; persistence; checkpointing	"How do you build a production agent?"
Planning	Plan-and-execute separates planning from execution	"Plan-and-execute vs. ReAct?"
Tree-of-Thought	Explore multiple reasoning paths; evaluate and prune	"When is ToT better than CoT?"
Memory (short)	Context window; sliding window or summarization	"How do you manage conversation memory?"
Memory (long)	Vector store + structured store; write side matters	"How do you build long-term agent memory?"
Episodic Memory	Past task traces with lessons learned	"How can agents learn from experience?"
Evaluation	Trajectory + outcome; LLM-as-judge + automated	"How do you evaluate agent quality?"
Safety	Least privilege; sandboxing; human-in-the-loop	"How do you make agents safe?"

Spaced Repetition Checkpoints

Day 0 (Today)

Explain the ReAct pattern with an example
List 5 principles for designing good tool definitions
Describe the three types of agent memory
Name 4 common agent failure modes and their fixes

Day 3

Compare AutoGen, CrewAI, and LangGraph architectures
Explain plan-and-execute vs. ReAct trade-offs
Describe MCP and its four capability types
Design a permission system for an agent with database access

Day 7

Design a multi-agent code review system
Explain tree-of-thought and when to use it
Describe episodic memory and how it enables agent learning
List evaluation dimensions for agent quality

Day 14

Whiteboard a complete agent architecture for a customer support system
Explain how LangGraph handles persistence and checkpointing
Design a memory system for an agent handling 10K daily conversations
Discuss agent sandboxing strategies with trade-offs

Day 21

Present a 30-minute deep dive on agent reliability engineering
Compare MCP with alternative tool integration approaches
Design an agent evaluation pipeline with trajectory scoring
Critique a given multi-agent architecture and propose improvements

Cross-References

RAG Systems - Retrieval as a tool in agent systems
Prompt Engineering - Prompt design for agent system prompts and tool descriptions
LLM Evaluation - Evaluation methods applicable to agent trajectories
Inference Optimization - Optimizing multi-step agent latency
Safety and Guardrails - Deep dive on agent safety mechanisms
LLM Interview Questions Bank - Additional agent architecture questions

Why Interviewers Care​

1. Foundations: What Makes an Agent​

Agent vs. Chain vs. RAG​

2. ReAct: Reasoning + Acting​

The ReAct Pattern​

Why ReAct Works​

ReAct vs. Chain-of-Thought vs. Act-Only​

3. Function Calling and Tool Use​

Function Calling Mechanism​

Tool Definition Best Practices​

Parallel Tool Calling​

4. Model Context Protocol (MCP)​

What Is MCP​

MCP Capabilities​

Why MCP Matters​

5. Multi-Agent Systems​

Why Multiple Agents​

Architectures​

Framework Comparison​

LangGraph Deep Dive​

6. Planning Strategies​

Task Decomposition​

Plan-and-Execute​

Tree-of-Thought (ToT)​

Reflection and Self-Critique​

7. Memory Systems​

Memory Types​

Short-Term Memory Management​

Long-Term Memory with Vector Stores​

Episodic Memory​

8. Agent Evaluation and Debugging​

Why Agent Evaluation Is Hard​

Evaluation Dimensions​

Evaluation Frameworks​

Debugging Techniques​

Common Agent Failure Modes​

9. Safety: Sandboxing and Permission Systems​

Why Agent Safety Matters​

Defense in Depth​

Permission Systems​

Sandboxing Strategies​

Human-in-the-Loop Patterns​

Practice Problems​

Problem 1: Design an Agent Architecture​

Problem 2: Multi-Agent Coordination​

Problem 3: Memory System Design​

Interview Cheat Sheet​

Spaced Repetition Checkpoints​

Day 0 (Today)​

Day 3​

Day 7​

Day 14​

Day 21​

Cross-References​

Why Interviewers Care

1. Foundations: What Makes an Agent

Agent vs. Chain vs. RAG

2. ReAct: Reasoning + Acting

The ReAct Pattern

Why ReAct Works

ReAct vs. Chain-of-Thought vs. Act-Only

3. Function Calling and Tool Use

Function Calling Mechanism

Tool Definition Best Practices

Parallel Tool Calling

4. Model Context Protocol (MCP)

What Is MCP

MCP Capabilities

Why MCP Matters

5. Multi-Agent Systems

Why Multiple Agents

Architectures

Framework Comparison

LangGraph Deep Dive

6. Planning Strategies

Task Decomposition

Plan-and-Execute

Tree-of-Thought (ToT)

Reflection and Self-Critique

7. Memory Systems

Memory Types

Short-Term Memory Management

Long-Term Memory with Vector Stores

Episodic Memory

8. Agent Evaluation and Debugging

Why Agent Evaluation Is Hard

Evaluation Dimensions

Evaluation Frameworks

Debugging Techniques

Common Agent Failure Modes

9. Safety: Sandboxing and Permission Systems

Why Agent Safety Matters

Defense in Depth

Permission Systems

Sandboxing Strategies

Human-in-the-Loop Patterns

Practice Problems

Problem 1: Design an Agent Architecture

Problem 2: Multi-Agent Coordination

Problem 3: Memory System Design

Interview Cheat Sheet

Spaced Repetition Checkpoints

Day 0 (Today)

Day 3

Day 7

Day 14

Day 21

Cross-References