Multi-Agent Architectures
A Production Scenario
Your company is building a system to automatically generate engineering blog posts from pull request data. The workflow seems straightforward: take a PR, write a blog post. You build a single agent with access to code reading tools, web search, and a text generation pipeline. You run it. The results are technically accurate but read like machine-generated content - flat, jargon-heavy, without narrative arc.
You add better prompting. You make the system prompt longer. You add few-shot examples. Better, but still mediocre. The problem is that "analyze code deeply" and "write engagingly for engineers" are different cognitive modes that interfere with each other when crammed into one agent. The agent that is trying to understand the technical details is not in a good mental state to produce compelling prose.
You split the work. One agent reads the PR and extracts the technical story: what changed, why, what the impact is, what tradeoffs were made. A second agent takes that technical summary and writes a draft blog post with narrative arc. A third agent critiques the draft for technical accuracy. A fourth revises based on the critique. Suddenly you have something worth publishing.
This is the core insight of multi-agent architectures: specialization works. Just as you would not ask the same engineer to simultaneously do code review, write marketing copy, and manage the sprint, you should not ask a single agent to simultaneously hold every concern in mind. Multiple agents, each doing what they do best, with their outputs coordinated, outperform a single generalist agent on complex tasks.
But multi-agent systems come with new challenges: communication protocols, shared state management, debugging distributed reasoning, and controlling costs when each agent generates its own token stream. This lesson covers the architectures, the tradeoffs, and the production engineering.
Why This Exists
The Context Window Is Not Just a Size Problem
A single agent with a large context window can technically hold all the information needed for a complex task. But context window size is not the only bottleneck. There is also attention dilution: the model's attention is spread across everything in context, and the more there is, the harder it is to maintain focus on any specific aspect.
When you ask one agent to play researcher, writer, critic, and editor simultaneously, you are asking it to hold all four perspectives at once. The "writer" persona is trying to produce good prose. The "critic" persona is trying to find flaws. These perspectives require different things from the model, and having them active simultaneously degrades both.
Separate agents with separate contexts can each maintain focus. The researcher's context is full of technical details and search results. The writer's context is full of narrative examples and style guides. Neither is diluted by the other.
Specialization Enables Better Tool Sets
A research agent needs access to search, database queries, and document readers. A code execution agent needs a Python sandbox and a linter. A review agent needs diff comparison tools. If you give all these tools to one agent, it has to decide which to use from a very large menu, and its tool descriptions become noisy.
Separate agents with focused tool sets make cleaner decisions. Each agent's tool set makes sense for its role, and the model's attention is on a smaller set of relevant options.
Parallelism
For tasks with independent sub-problems, multiple agents can work in parallel. A single agent that needs to research five companies serially takes five times as long as five agents each researching one company simultaneously. The orchestrating agent can collect all results when they are ready.
Historical Context
Society of Mind (Marvin Minsky, 1986) - the philosophical predecessor. Intelligence emerges from the interaction of many simple agents, each capable of limited tasks. No single agent understands the whole system.
AutoGen (Microsoft Research, 2023) - one of the first production-grade multi-agent frameworks for LLMs. Introduced conversational multi-agent architectures where agents communicate through a chat interface. Notable for its GroupChat abstraction.
CrewAI (2023) - role-based multi-agent framework with explicit agent personas, goals, and backstories. Emphasizes the "crew" metaphor: each agent has a role, each role has tasks.
LangGraph (LangChain team, 2024) - graph-based multi-agent workflows with persistent state, conditional routing, and explicit cycle support. More flexible than CrewAI, better suited for complex conditional workflows.
The Three Core Patterns
Pattern 1: Orchestrator-Worker
A manager agent receives the task, decomposes it into subtasks, assigns each subtask to a specialized worker agent, collects results, and synthesizes a final answer.
Best for: Tasks with clear decomposition, where the manager can identify what each worker needs without seeing their intermediate output.
Risk: The orchestrator is a single point of failure. If it misdecomposes the task, all workers do the wrong thing.
Pattern 2: Pipeline
Agents arranged in sequence, each transforming the output of the previous one. Agent 1 produces raw material; Agent 2 refines it; Agent 3 reviews it; Agent 4 formats it.
Best for: Multi-stage content production, data transformation pipelines, code generation followed by testing followed by documentation.
Risk: Errors early in the pipeline propagate and amplify. Agent 3 cannot fix a fundamental mistake made by Agent 1.
Pattern 3: Peer-to-Peer (Debate/Voting)
Multiple agents process the same input independently and then interact - debating, critiquing each other, voting, or averaging their outputs.
Best for: Tasks where diverse perspectives reduce error, adversarial tasks (red team / blue team), and tasks where consensus indicates higher confidence.
Risk: Can produce mediocre outputs that represent the average of all perspectives rather than the best one. Debate loops can run long and get expensive.
Building a Research + Writing + Review Pipeline with LangGraph
This example builds a three-agent pipeline for generating technical content from a topic.
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
from typing import TypedDict, Annotated
import operator
llm = ChatAnthropic(model="claude-opus-4-6", temperature=0)
class ContentPipelineState(TypedDict):
topic: str
research_notes: str
draft: str
review_feedback: str
final_content: str
revision_count: int
# ── Agent 1: Researcher ───────────────────────────────────────────────────────
RESEARCHER_SYSTEM = """You are a technical researcher. Your job is to gather and organize
key information about a topic. Focus on:
- Core concepts and their relationships
- Real-world applications and examples
- Common misconceptions to address
- Key statistics or benchmarks if relevant
Output structured research notes in markdown."""
def researcher_node(state: ContentPipelineState) -> dict:
print("[RESEARCHER] Gathering research...")
response = llm.invoke([
SystemMessage(content=RESEARCHER_SYSTEM),
HumanMessage(content=f"Research topic: {state['topic']}\n\nProvide comprehensive research notes.")
])
return {"research_notes": response.content}
# ── Agent 2: Writer ───────────────────────────────────────────────────────────
WRITER_SYSTEM = """You are a technical writer. You transform research notes into
engaging, clear technical content. Your writing:
- Opens with a concrete problem or scenario
- Explains the WHY before the WHAT
- Uses code examples where appropriate
- Avoids jargon without explanation
- Maintains a conversational but authoritative tone"""
def writer_node(state: ContentPipelineState) -> dict:
print("[WRITER] Drafting content...")
feedback_section = ""
if state.get("review_feedback"):
feedback_section = f"\n\nRevision feedback to address:\n{state['review_feedback']}"
if state.get("draft"):
feedback_section += f"\n\nPrevious draft:\n{state['draft']}"
response = llm.invoke([
SystemMessage(content=WRITER_SYSTEM),
HumanMessage(content=(
f"Topic: {state['topic']}\n\n"
f"Research Notes:\n{state['research_notes']}"
f"{feedback_section}\n\n"
f"Write a comprehensive technical article (800+ words)."
))
])
return {
"draft": response.content,
"revision_count": state.get("revision_count", 0) + 1
}
# ── Agent 3: Reviewer ─────────────────────────────────────────────────────────
REVIEWER_SYSTEM = """You are a senior technical editor. Review content for:
1. Technical accuracy - are the facts correct?
2. Clarity - will the target audience understand this?
3. Completeness - are important aspects missing?
4. Structure - does the content flow logically?
Output a JSON object:
{
"approved": true/false,
"score": 1-10,
"feedback": "specific actionable feedback if not approved",
"issues": ["list of specific issues"]
}"""
def reviewer_node(state: ContentPipelineState) -> dict:
print("[REVIEWER] Reviewing draft...")
response = llm.invoke([
SystemMessage(content=REVIEWER_SYSTEM),
HumanMessage(content=(
f"Topic: {state['topic']}\n\n"
f"Content to review:\n{state['draft']}"
))
])
import json
import re
json_match = re.search(r'\{.*\}', response.content, re.DOTALL)
if json_match:
try:
review_data = json.loads(json_match.group())
approved = review_data.get("approved", False)
feedback = review_data.get("feedback", "")
score = review_data.get("score", 5)
print(f"[REVIEWER] Score: {score}/10, Approved: {approved}")
return {
"review_feedback": feedback if not approved else "",
"final_content": state["draft"] if approved else ""
}
except json.JSONDecodeError:
pass
# Fallback: approve if score looks good in raw text
if "approved" in response.content.lower() and "true" in response.content.lower():
return {"final_content": state["draft"], "review_feedback": ""}
return {"review_feedback": response.content, "final_content": ""}
# ── Routing Logic ─────────────────────────────────────────────────────────────
def route_after_review(state: ContentPipelineState) -> str:
if state.get("final_content"):
return "done"
if state.get("revision_count", 0) >= 3:
# Max revisions - use the last draft
print("[ROUTER] Max revisions reached, using last draft")
return "done"
return "revise"
# ── Build the Graph ───────────────────────────────────────────────────────────
workflow = StateGraph(ContentPipelineState)
workflow.add_node("researcher", researcher_node)
workflow.add_node("writer", writer_node)
workflow.add_node("reviewer", reviewer_node)
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_conditional_edges(
"reviewer",
route_after_review,
{
"revise": "writer", # Loop back to writer with feedback
"done": END
}
)
pipeline = workflow.compile()
# Run the pipeline
result = pipeline.invoke({
"topic": "How vector databases enable semantic search in RAG systems",
"research_notes": "",
"draft": "",
"review_feedback": "",
"final_content": "",
"revision_count": 0
})
print(f"\nFinal content ({len(result['final_content'])} chars):")
print(result["final_content"][:500])
CrewAI: Role-Based Agent Coordination
CrewAI makes multi-agent systems feel like hiring a team. Each agent has a role, a goal, a backstory, and assigned tasks.
from crewai import Agent, Task, Crew, Process
from crewai.tools import tool
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-opus-4-6")
# ── Define Tools ──────────────────────────────────────────────────────────────
@tool("Web Search")
def web_search(query: str) -> str:
"""Search the web for information about a topic."""
return f"[Search results for: {query}]"
@tool("Code Analysis")
def analyze_code(code: str) -> str:
"""Analyze a code snippet for quality, patterns, and issues."""
return f"[Code analysis for provided snippet]"
# ── Define Agents ─────────────────────────────────────────────────────────────
researcher = Agent(
role="Senior Research Analyst",
goal="Uncover comprehensive, accurate information about technical topics",
backstory=(
"You are a meticulous researcher with 10 years of experience in AI/ML. "
"You verify claims, cite sources, and identify gaps in understanding."
),
tools=[web_search],
llm=llm,
verbose=True,
allow_delegation=False,
max_iter=5
)
writer = Agent(
role="Technical Content Writer",
goal="Transform research into clear, engaging technical documentation",
backstory=(
"You are a former engineer turned writer. You understand the technical details "
"but excel at explaining them clearly to fellow engineers without dumbing things down."
),
llm=llm,
verbose=True,
allow_delegation=False,
)
reviewer = Agent(
role="Senior Technical Editor",
goal="Ensure technical accuracy and writing quality before publication",
backstory=(
"You are a rigorous editor who has reviewed hundreds of technical articles. "
"You catch factual errors, unclear explanations, and structural problems."
),
llm=llm,
verbose=True,
allow_delegation=True, # Can delegate back to writer
)
# ── Define Tasks ──────────────────────────────────────────────────────────────
research_task = Task(
description=(
"Research the topic: {topic}\n"
"Produce detailed notes covering: core concepts, practical applications, "
"common pitfalls, and best practices. Include specific examples."
),
expected_output="Comprehensive research notes in markdown (500+ words)",
agent=researcher,
)
writing_task = Task(
description=(
"Using the research notes, write a technical article about: {topic}\n"
"The article should: open with a real problem, explain WHY before WHAT, "
"include code examples, and end with practical takeaways."
),
expected_output="Complete technical article (800+ words, markdown format)",
agent=writer,
context=[research_task] # Writer receives researcher's output
)
review_task = Task(
description=(
"Review the technical article for: {topic}\n"
"Check: technical accuracy, clarity, completeness, code correctness.\n"
"Either approve it or provide specific revision instructions."
),
expected_output="Either 'APPROVED' with summary, or specific revision feedback",
agent=reviewer,
context=[writing_task] # Reviewer receives writer's output
)
# ── Create and Run the Crew ───────────────────────────────────────────────────
crew = Crew(
agents=[researcher, writer, reviewer],
tasks=[research_task, writing_task, review_task],
process=Process.sequential, # Tasks run in order
verbose=True,
max_rpm=10 # Rate limit: max 10 requests per minute
)
result = crew.kickoff(inputs={"topic": "Transformer attention mechanisms"})
print(result.raw)
Communication Protocols
Multi-agent systems need a way for agents to share information. The main approaches:
Shared State (LangGraph approach)
A central state object is passed between nodes. Each agent reads from and writes to a specific subset of the state. This is the cleanest pattern for deterministic pipelines.
# In LangGraph, the TypedDict state is the communication protocol
class SharedAgentState(TypedDict):
task: str
research: str # Written by researcher, read by writer
draft: str # Written by writer, read by reviewer
feedback: str # Written by reviewer, read by writer
approved: bool # Written by reviewer, read by router
Message Passing (AutoGen approach)
Agents communicate by sending messages to each other. Each agent maintains its own conversation history. An orchestrator routes messages between agents.
# Simplified AutoGen-style message passing
from dataclasses import dataclass, field
from typing import Callable
@dataclass
class AgentMessage:
from_agent: str
to_agent: str
content: str
metadata: dict = field(default_factory=dict)
class MessageBus:
"""Routes messages between agents."""
def __init__(self):
self._agents: dict[str, Callable] = {}
self._message_log: list[AgentMessage] = []
def register(self, name: str, handler: Callable):
self._agents[name] = handler
def send(self, message: AgentMessage):
self._message_log.append(message)
if message.to_agent in self._agents:
return self._agents[message.to_agent](message)
raise ValueError(f"Unknown agent: {message.to_agent}")
def get_conversation_log(self) -> list[AgentMessage]:
return self._message_log.copy()
# Usage
bus = MessageBus()
def researcher_handler(msg: AgentMessage) -> AgentMessage:
# Research the topic in msg.content
result = f"Research results for: {msg.content}"
return AgentMessage(
from_agent="researcher",
to_agent=msg.from_agent,
content=result
)
bus.register("researcher", researcher_handler)
Coordination Challenges
Conflicting Outputs
Two agents may produce contradictory analyses. You need a resolution mechanism: priority ordering (Agent A's output takes precedence), voting, or a tie-breaking agent.
def resolve_conflict(
outputs: list[str],
resolution_strategy: str = "synthesize"
) -> str:
"""Resolve conflicting agent outputs."""
if resolution_strategy == "vote":
# Each output gets a vote; majority wins
# (Requires extracting a comparable decision from each)
pass
elif resolution_strategy == "synthesize":
# Use an LLM to synthesize the best of all outputs
combined = "\n\n---\n\n".join(
f"Agent {i+1} output:\n{o}" for i, o in enumerate(outputs)
)
response = llm.invoke([
HumanMessage(content=(
f"Multiple agents produced these outputs. Synthesize the best answer:\n\n{combined}"
))
])
return response.content
return outputs[0] # Default: first agent wins
Preventing Deadlocks
In peer-to-peer systems where Agent A waits for Agent B and Agent B waits for Agent A, you get a deadlock. Use timeouts and break cycles with a supervisor.
Cost Control
Each agent generates its own tokens. A 5-agent system can easily cost 5x what a single agent costs. Track costs per agent and implement budgets.
class CostTracker:
def __init__(self, budget_usd: float = 1.0):
self.budget_usd = budget_usd
self._spent: dict[str, float] = {}
def record(self, agent_name: str, input_tokens: int, output_tokens: int):
# Claude claude-opus-4-6 pricing (approximate)
cost = (input_tokens * 3 + output_tokens * 15) / 1_000_000
self._spent[agent_name] = self._spent.get(agent_name, 0) + cost
def total_spent(self) -> float:
return sum(self._spent.values())
def check_budget(self) -> bool:
return self.total_spent() < self.budget_usd
def report(self) -> str:
lines = [f"Total: ${self.total_spent():.4f} / ${self.budget_usd:.2f}"]
for agent, cost in sorted(self._spent.items(), key=lambda x: -x[1]):
lines.append(f" {agent}: ${cost:.4f}")
return "\n".join(lines)
Debugging Multi-Agent Systems
Multi-agent systems are harder to debug than single agents. Key techniques:
1. Trace every agent call: Log agent name, input, output, token count, latency.
2. Assign correlation IDs: Every root request gets a unique ID that propagates through all agent calls. This lets you reconstruct the full execution graph for any request.
3. Isolate agents for testing: Test each agent in isolation before testing the full pipeline. A bug in Agent 3 is obvious when Agent 3 is tested alone but invisible when it is buried in a 5-agent system.
4. Monitor inter-agent communication: Log all messages passing between agents. Unexpected message formats are a common failure mode.
import uuid
import time
import logging
logger = logging.getLogger(__name__)
def traced_agent_call(
agent_name: str,
agent_fn: Callable,
state: dict,
correlation_id: str | None = None
) -> dict:
"""Wrapper that logs every agent call with tracing."""
if correlation_id is None:
correlation_id = str(uuid.uuid4())
start_time = time.time()
logger.info(f"[{correlation_id}] Starting agent: {agent_name}")
try:
result = agent_fn(state)
elapsed = time.time() - start_time
logger.info(
f"[{correlation_id}] Agent {agent_name} completed in {elapsed:.2f}s"
)
return result
except Exception as e:
elapsed = time.time() - start_time
logger.error(
f"[{correlation_id}] Agent {agent_name} FAILED after {elapsed:.2f}s: {e}"
)
raise
Common Mistakes
:::danger Agents That Can Loop Indefinitely A reviewer that always sends content back to the writer, and a writer that never satisfies the reviewer, creates an infinite loop. Always cap revisions at a maximum count and accept the latest draft when the limit is reached. :::
:::danger Too Much Inter-Agent Communication If Agent A needs Agent B needs Agent C needs Agent A before any of them can produce output, the communication overhead dominates and latency explodes. Design agents to be as independent as possible. If A and B need to communicate bidirectionally, consider merging them into one agent. :::
:::warning Giving All Agents the Same Prompt and Calling It Multi-Agent Simply running the same LLM call multiple times and calling it "multi-agent" misses the point. Each agent should have a specialized prompt, specialized tools, and a specialized role. The diversity of perspective is what makes multi-agent better than single-agent. :::
:::warning Not Accounting for Error Propagation In a sequential pipeline, a garbage output from Agent 2 becomes the input to Agent 3. Agent 3 may produce a confident, well-formed response based on garbage input. Add validation gates between pipeline stages to catch bad outputs before they propagate. :::
Interview Q&A
Q: What are the main patterns for multi-agent coordination and when do you use each?
Orchestrator-worker is best when a manager can decompose the task upfront and workers operate independently - parallelizable tasks with clear decomposition. Pipeline is best for sequential transformation tasks where each stage processes the previous stage's output - content generation, data cleaning pipelines. Peer-to-peer debate is best when you want to stress-test a solution against adversarial critique or when you want to reduce single-agent bias - red team / blue team exercises, high-stakes decisions.
Q: How does LangGraph differ from CrewAI?
LangGraph is a graph computation framework - you define nodes (functions), edges (transitions), and conditional routing explicitly. It gives fine-grained control over state management and workflow logic. It is lower-level and more flexible. CrewAI is a higher-level abstraction focused on role-based agents - you define agent personas, goals, and tasks, and CrewAI handles coordination. CrewAI is faster to prototype; LangGraph is better for complex conditional workflows with loops and branching.
Q: How do you prevent cost explosion in multi-agent systems?
Several strategies: (1) implement per-agent token budgets with hard stops, (2) use cheaper models for simpler agents (a summarization agent does not need Claude claude-opus-4-6), (3) cap total iterations and revision cycles, (4) cache agent outputs - if two agents call the same tool with the same args, the second call should use the cached result, (5) use parallel execution to reduce wall-clock time, which reduces the temptation to add more agents.
Q: What is the communication protocol problem in multi-agent systems?
Agents need a shared format for passing information between them. If Agent A produces a JSON dict and Agent B expects a markdown string, they cannot communicate. Solutions: use a strongly-typed shared state (TypedDict in LangGraph), define explicit message schemas, or use a message passing system with validation. The typed shared state approach (LangGraph) is generally most reliable because type checkers catch mismatches at development time.
Q: How do you debug a multi-agent pipeline when the final output is wrong?
Work backwards: take the final output and identify which agent produced it. Look at that agent's input - was it wrong? Trace the input back to the previous agent. Continue until you find the point of failure. This is much easier with correlation IDs (each request gets a unique ID that propagates through all agent calls) and structured logging that records agent name, input hash, output, and timestamps. LangSmith does this automatically for LangChain/LangGraph workflows.
:::tip 🎮 Interactive Playground
Visualize this concept: Try the Multi-Agent Systems demo on the EngineersOfAI Playground - no code required.
:::
