Skip to main content

AutoGen Conversational Agents

The Debate That Solved the Architecture Problem

The architecture review is stuck. Your team has been designing the data ingestion system for six weeks and cannot reach consensus. Three senior engineers have three distinct approaches: one argues for a streaming pipeline with Kafka, one argues for a batch ETL with Airflow, and one argues for an event-driven Lambda architecture. All three are technically valid. All three have real trade-offs. The meeting ends without a decision, again.

A junior engineer on the team tries something different. She sets up an AutoGen team: an Architect agent that proposes the Kafka approach, a second Architect agent configured to argue for the batch ETL approach, a third for the Lambda architecture, and a Critic agent that pokes holes in every argument. She gives them the system requirements - 50 TB/day ingestion, sub-second query latency, multi-region failover, 99.99% SLA - and tells them to debate.

The conversation runs for forty turns. The three architect agents make their cases. The critic identifies weaknesses. The architects respond to criticism, acknowledge valid points, and refine their positions. By turn 25, the Kafka agent and the Lambda agent have converged on a hybrid approach neither human engineer had proposed: streaming with Kafka for real-time paths, with a secondary Lambda layer for retry and dead-letter processing, and Airflow for the batch reconciliation jobs that run nightly.

The junior engineer prints the conversation transcript and brings it to the next architecture review. Fifteen minutes later, the team has a decision. The debate surfaced trade-offs the humans had not articulated clearly because everyone came in with a position. The agents argued without ego.

This is AutoGen's distinctive strength: structured multi-agent conversation that produces emergent insights through productive disagreement. When your problem benefits from debate, critique, and negotiation between perspectives - not just pipeline execution - AutoGen's conversational model is the right tool.


:::tip 🎮 Interactive Playground Visualize this concept: Try the Agent Frameworks demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

The Conversation as Computation

Most agent frameworks model computation as a graph: nodes do work, edges pass data. AutoGen's original insight was different: conversation is a natural model for multi-agent coordination. Humans coordinate through conversation - arguing, agreeing, revising, delegating. Why not structure AI coordination the same way?

The original AutoGen (2023, Microsoft Research) implemented this through ConversableAgent - every agent could initiate and receive messages from every other agent. The simplicity was intentional: two agents talking to each other is the fundamental unit. Group chats, debates, and team structures are built from that primitive.

The framework demonstrated that the chat model produced real benefits: code review between a user proxy and an assistant produces better code than either alone. Debate between agents with opposing positions surfaces blind spots. A critic that can respond to and challenge another agent's reasoning creates a more thorough analysis than a single agent evaluating its own work.

v0.2 vs v0.4: A Complete Redesign

AutoGen v0.2 was the framework that went viral. Simple, effective, and used by hundreds of thousands of developers. It was also architecturally limited: synchronous execution, difficult to scale, hard to add new agent types, and tightly coupled to the OpenAI API.

AutoGen v0.4 (December 2024) is a complete rewrite. The architecture changed from a synchronous message-passing model to an asynchronous event-driven runtime. Key changes:

v0.2v0.4
ConversableAgent base classEvent-driven Agent protocol
Synchronous executionAsync first
initiate_chat()run() / run_stream()
GroupChatManagerTeam types: RoundRobinGroupChat, SelectorGroupChat, Swarm
Tight OpenAI couplingMulti-model support
Limited customizationPluggable runtimes

The v0.2 to v0.4 migration is significant. The APIs are incompatible. Teams running v0.2 in production need a proper migration plan, not just a version bump.


Historical Context

DateEvent
September 2023AutoGen v0.1 - first public release, Microsoft Research
October 2023AutoGen v0.2 - ConversableAgent, GroupChat, code execution
January 2024AutoGen Studio - visual no-code agent builder
June 2024MagenticOne - single-agent orchestrator for multi-agent tasks
December 2024AutoGen v0.4 - event-driven rewrite, AgentChat, Teams
Early 2025AutoGen v0.4 stable, Microsoft Azure AI integration

The MagenticOne system (June 2024) was particularly notable: a single orchestrator agent that coordinated multiple specialist agents (WebSurfer, FileSurfer, Coder, ComputerTerminal) to complete complex tasks. MagenticOne achieved strong benchmark results and demonstrated the practical viability of AutoGen's conversational model for difficult, multi-step tasks.


AutoGen v0.4 Architecture

Installing AutoGen v0.4

# Core AgentChat package
pip install autogen-agentchat

# Model clients
pip install autogen-ext[anthropic] # For Claude
pip install autogen-ext[openai] # For OpenAI

# Code execution (Docker sandbox)
pip install autogen-ext[docker]

# Optional: AutoGen Studio for visual development
pip install autogenstudio

AssistantAgent

The primary reasoning agent:

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

# Initialize the model client
model_client = AnthropicChatCompletionClient(
model="claude-opus-4-6",
api_key="your-anthropic-key",
max_tokens=4096
)

# Define tools
from autogen_agentchat.tools import FunctionTool

async def search_web(query: str) -> str:
"""Search the web for information."""
# Real implementation would call a search API
return f"Search results for '{query}': [result 1, result 2, result 3]"

async def run_python(code: str) -> str:
"""Execute Python code and return the output."""
import subprocess, sys, tempfile, os
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
tmpfile = f.name
try:
result = subprocess.run(
[sys.executable, tmpfile],
capture_output=True, text=True, timeout=30
)
return result.stdout if result.returncode == 0 else f"Error: {result.stderr}"
finally:
os.unlink(tmpfile)

search_tool = FunctionTool(search_web, description="Search the web for information")
code_tool = FunctionTool(run_python, description="Execute Python code")

# Create the assistant
assistant = AssistantAgent(
name="research_assistant",
system_message="""You are a research assistant with access to web search and
code execution. When asked to research a topic:
1. Search for current information from multiple angles
2. Use code to analyze or verify numerical claims when appropriate
3. Synthesize findings into a clear, structured response
4. Flag anything you are uncertain about""",
model_client=model_client,
tools=[search_tool, code_tool]
)

UserProxyAgent

Represents a human user or a code executor. In production, the UserProxyAgent is often configured to execute code automatically rather than waiting for a human:

from autogen_agentchat.agents import UserProxyAgent
from autogen_agentchat.base import TaskResult

# Human proxy - asks for human input at each turn
human_proxy = UserProxyAgent(
name="human_user",
input_func=input # Built-in: asks human via terminal
)

# Automated proxy - no human required
from autogen_agentchat.agents import CodeExecutorAgent
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

async def create_code_executor():
"""Create a Docker-based code executor."""
executor = DockerCommandLineCodeExecutor(
image="python:3.11-slim",
work_dir="/workspace",
timeout=60
)
await executor.start()
return executor

# Code executor agent - automatically runs code blocks from assistant
# In production: use Docker for isolation

Team Types

RoundRobinGroupChat

Each agent takes turns in a fixed round-robin order. Simple and predictable:

from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination

# Create the agents
critic = AssistantAgent(
name="critic",
system_message="""You are a rigorous technical critic. When given a solution or proposal:
1. Identify the weakest assumptions
2. Find edge cases that were not considered
3. Propose specific improvements
4. Be concise - one focused critique per turn""",
model_client=model_client
)

defender = AssistantAgent(
name="defender",
system_message="""You defend and improve technical proposals under critique.
When criticized:
1. Acknowledge valid points explicitly
2. Counter invalid points with evidence
3. Propose specific improvements to address valid criticism
4. Revise the proposal when warranted
End with FINAL PROPOSAL when you believe the proposal is ready.""",
model_client=model_client
)

# Termination: stop when defender says FINAL PROPOSAL or after 10 messages
termination = (
TextMentionTermination("FINAL PROPOSAL") |
MaxMessageTermination(max_messages=10)
)

# Round-robin: critic → defender → critic → defender...
debate_team = RoundRobinGroupChat(
participants=[critic, defender],
termination_condition=termination
)

import asyncio

async def run_debate(proposal: str):
result = await debate_team.run(
task=f"Evaluate and improve this technical proposal:\n\n{proposal}"
)
# Print the full conversation
for message in result.messages:
print(f"\n[{message.source}]: {message.content}")
return result

asyncio.run(run_debate("""
API Design Proposal: All endpoints should accept both JSON and GraphQL queries,
with automatic schema generation from Pydantic models and response caching at 5 minutes.
"""))

SelectorGroupChat

An LLM selector chooses the next speaker based on the conversation context. More dynamic than round-robin:

from autogen_agentchat.teams import SelectorGroupChat

frontend_expert = AssistantAgent(
name="frontend_expert",
system_message="You are a frontend engineer. Speak when frontend concerns are relevant.",
model_client=model_client
)

backend_expert = AssistantAgent(
name="backend_expert",
system_message="You are a backend engineer. Speak when backend/API concerns are relevant.",
model_client=model_client
)

security_expert = AssistantAgent(
name="security_expert",
system_message="You are a security engineer. Speak when security or auth concerns arise.",
model_client=model_client
)

product_manager = AssistantAgent(
name="product_manager",
system_message="""You are the PM. You ask clarifying questions and
synthesize the team's input into a decision. End with DECISION REACHED when consensus is clear.""",
model_client=model_client
)

termination = TextMentionTermination("DECISION REACHED") | MaxMessageTermination(15)

# Selector picks the most relevant expert for each turn
architecture_team = SelectorGroupChat(
participants=[frontend_expert, backend_expert, security_expert, product_manager],
model_client=model_client, # Used by selector to pick next speaker
termination_condition=termination,
selector_prompt="""Given the conversation history, select the next participant
who has the most relevant expertise for the current topic. Return only the name
of the participant, nothing else."""
)

async def architecture_review(proposal: str):
result = await architecture_team.run(
task=f"Review this architecture proposal as a team:\n\n{proposal}"
)
return result

Swarm

Agents delegate control to each other via handoffs. The currently active agent can hand off to any other agent:

from autogen_agentchat.teams import Swarm
from autogen_agentchat.messages import HandoffMessage

# Swarm agents can hand off to each other
triage_agent = AssistantAgent(
name="triage",
system_message="""You triage incoming support requests. Classify the request and hand off to:
- 'billing' for payment/subscription issues
- 'technical' for product bugs or technical questions
- 'escalation' for angry customers or complex multi-issue cases

Always hand off - do not try to resolve requests yourself.""",
model_client=model_client,
handoffs=["billing", "technical", "escalation"]
)

billing_agent = AssistantAgent(
name="billing",
system_message="""You handle billing and subscription support.
Resolve billing questions. If the issue requires technical knowledge, hand off to 'technical'.""",
model_client=model_client,
handoffs=["technical", "escalation"]
)

technical_agent = AssistantAgent(
name="technical",
system_message="""You resolve technical support questions with specific instructions.
If the customer is unsatisfied or the issue is unresolvable, hand off to 'escalation'.""",
model_client=model_client,
handoffs=["escalation"]
)

escalation_agent = AssistantAgent(
name="escalation",
system_message="""You handle escalated cases. You have authority to offer refunds and credits.
Resolve with empathy and a concrete resolution. End with RESOLVED when the case is closed.""",
model_client=model_client
)

termination = TextMentionTermination("RESOLVED") | MaxMessageTermination(20)

support_swarm = Swarm(
participants=[triage_agent, billing_agent, technical_agent, escalation_agent],
termination_condition=termination
)

async def handle_support_ticket(ticket: str):
result = await support_swarm.run(task=ticket)
return result

Code Execution: The Killer Feature

AutoGen's most distinctive feature is first-class, sandboxed code execution. The AssistantAgent generates code; the CodeExecutorAgent runs it; the output returns to the assistant for interpretation.

from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor
import asyncio, tempfile

async def run_coding_session():
# Local code executor (use Docker for production)
work_dir = tempfile.mkdtemp()
executor = LocalCommandLineCodeExecutor(work_dir=work_dir, timeout=60)

code_executor_agent = CodeExecutorAgent(
name="code_runner",
code_executor=executor
)

coding_assistant = AssistantAgent(
name="coding_assistant",
system_message="""You are a Python data analysis expert.
When given a data analysis task:
1. Write Python code to solve it (always in ```python code blocks)
2. The code will be automatically executed and results returned to you
3. Interpret the results and write more code if needed
4. When the analysis is complete, write ANALYSIS COMPLETE and summarize findings

Always write clean, well-commented code. Handle errors gracefully.""",
model_client=model_client
)

termination = (
TextMentionTermination("ANALYSIS COMPLETE") |
MaxMessageTermination(20)
)

team = RoundRobinGroupChat(
participants=[coding_assistant, code_executor_agent],
termination_condition=termination
)

task = """Analyze this dataset and find trends:

sales_data = {
"Q1_2024": {"revenue": 1200000, "customers": 340, "churn_rate": 0.08},
"Q2_2024": {"revenue": 1350000, "customers": 380, "churn_rate": 0.07},
"Q3_2024": {"revenue": 1180000, "customers": 350, "churn_rate": 0.11},
"Q4_2024": {"revenue": 1580000, "customers": 430, "churn_rate": 0.06}
}

Calculate: revenue per customer per quarter, identify the outlier quarter,
and project Q1 2025 revenue with 95% confidence interval."""

result = await team.run(task=task)

print("\n=== Coding Session Transcript ===")
for msg in result.messages:
print(f"\n[{msg.source}]")
print(msg.content[:500])

return result

asyncio.run(run_coding_session())

Docker-Based Code Execution for Production

from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

async def create_production_executor():
executor = DockerCommandLineCodeExecutor(
image="python:3.11-slim",
work_dir="/workspace",
timeout=120,
# Mount volumes for data access
bind_dir="/path/to/local/data"
)
await executor.start()

# Don't forget to stop the executor when done
try:
# ... use executor ...
pass
finally:
await executor.stop()

Full Example: Multi-Agent Debate System

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat, SelectorGroupChat
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

model = AnthropicChatCompletionClient(
model="claude-opus-4-6",
api_key="your-key"
)

# Create a multi-perspective debate system
def make_advocate(position: str, name: str) -> AssistantAgent:
return AssistantAgent(
name=name,
system_message=f"""You are an expert advocate for the following position:
{position}

When making your case:
- Use specific evidence and data
- Acknowledge the strongest counterarguments
- Update your position when presented with compelling evidence
- Be direct: make your point in 150 words or less per turn

When you believe a final decision has been reached, say CONSENSUS: [decision].""",
model_client=model
)

async def run_technical_debate(question: str, positions: list[tuple[str, str]]) -> dict:
"""
Run a structured debate between multiple positions.

Args:
question: The question to debate
positions: List of (position_description, agent_name) tuples

Returns:
Dict with final consensus and full transcript
"""
advocates = [make_advocate(pos, name) for pos, name in positions]

moderator = AssistantAgent(
name="moderator",
system_message="""You moderate technical debates. Your role:
1. After each round, summarize key points of agreement and disagreement
2. Ask probing questions to surface hidden assumptions
3. When consensus emerges, formalize it as FINAL DECISION: [decision]
Keep debate focused and productive. Prevent repetition.""",
model_client=model
)

all_participants = advocates + [moderator]
termination = (
TextMentionTermination("FINAL DECISION") |
MaxMessageTermination(max_messages=24)
)

team = RoundRobinGroupChat(
participants=all_participants,
termination_condition=termination
)

result = await team.run(task=f"Debate: {question}")

# Extract the final decision from the transcript
final_decision = None
for msg in reversed(result.messages):
if "FINAL DECISION" in str(msg.content) or "CONSENSUS" in str(msg.content):
final_decision = msg.content
break

return {
"question": question,
"final_decision": final_decision,
"message_count": len(result.messages),
"transcript": [(m.source, m.content) for m in result.messages],
"stop_reason": result.stop_reason
}

async def main():
result = await run_technical_debate(
question="Should we use Postgres with JSON columns or a dedicated document store for flexible schema data?",
positions=[
("Postgres with JSONB columns is the right choice. It combines ACID guarantees, "
"mature tooling, and flexible schema in a single system.", "postgres_advocate"),
("A dedicated document store (MongoDB or DynamoDB) is the right choice. "
"Schema flexibility is a first-class concern, not an afterthought.", "docstore_advocate"),
("Neither extreme is correct. The decision depends on query patterns. "
"Start with Postgres JSONB; migrate to a document store when query complexity demands it.", "pragmatist"),
]
)

print(f"\nDebate completed in {result['message_count']} messages")
print(f"\nFinal Decision:\n{result['final_decision']}")

# Show key moments in the debate
print("\n=== Key Debate Moments ===")
for source, content in result['transcript']:
if len(content) > 50: # Skip trivial messages
print(f"\n[{source}]: {content[:300]}...")

asyncio.run(main())

AutoGen Studio

AutoGen Studio provides a visual interface for building and testing AutoGen agents without code. It is useful for prototyping and for non-engineering stakeholders who need to configure agents:

# Install and run AutoGen Studio
pip install autogenstudio
autogenstudio ui --port 8081

AutoGen Studio provides:

  • Visual agent builder (name, system message, tools, model)
  • Team configuration (round-robin, selector, swarm)
  • Playground for testing conversations
  • Session history and replay
  • Export to Python code

For production systems, use AutoGen Studio for prototyping and export to code for deployment. Do not use AutoGen Studio as the production runtime - it is a development tool.


Streaming

async def run_with_streaming(question: str):
"""Run a team with real-time streaming output."""
team = RoundRobinGroupChat(
participants=[assistant, critic],
termination_condition=MaxMessageTermination(6)
)

print("Streaming team conversation:\n")
async for event in team.run_stream(task=question):
from autogen_agentchat.messages import TextMessage, ToolCallResultMessage
from autogen_agentchat.base import TaskResult

if isinstance(event, TextMessage):
# Stream individual tokens as they arrive
print(f"\n[{event.source}]: ", end="")
print(event.content, end="", flush=True)

elif isinstance(event, ToolCallResultMessage):
print(f"\n[Tool Result]: {str(event.content)[:200]}")

elif isinstance(event, TaskResult):
print(f"\n\n=== Task complete. Stop reason: {event.stop_reason} ===")

AutoGen v0.2 to v0.4 Migration

For teams upgrading from v0.2:

# ─── v0.2 (DO NOT USE FOR NEW CODE) ──────────────────────────────────────────
# from autogen import AssistantAgent, UserProxyAgent
# assistant = AssistantAgent(name="assistant", llm_config={"model": "gpt-4"})
# user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
# user_proxy.initiate_chat(assistant, message="Your task here")

# ─── v0.4 equivalent ──────────────────────────────────────────────────────────
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

model_client = AnthropicChatCompletionClient(model="claude-opus-4-6")

assistant = AssistantAgent(
name="assistant",
model_client=model_client,
system_message="You are a helpful assistant."
)

termination = MaxMessageTermination(max_messages=10)
team = RoundRobinGroupChat([assistant], termination_condition=termination)

async def run():
result = await team.run(task="Your task here")
print(result.messages[-1].content)

asyncio.run(run())

Key migration points:

  • initiate_chat()team.run(task=...)
  • llm_config={"model": ...}model_client=AnthropicChatCompletionClient(...)
  • GroupChatManagerSelectorGroupChat or RoundRobinGroupChat
  • All execution is now async by default

Production Engineering Notes

Termination Conditions

Always set explicit termination conditions. An unconstrained conversation runs until the token limit or until you manually terminate it:

from autogen_agentchat.conditions import (
MaxMessageTermination,
TextMentionTermination,
TokenUsageTermination,
TimeoutTermination
)
from datetime import timedelta

# Combine termination conditions with | (OR) and & (AND)
termination = (
TextMentionTermination("TASK COMPLETE") | # Stop if any agent says this
MaxMessageTermination(max_messages=20) | # Stop after 20 messages
TimeoutTermination(timedelta(minutes=5)) # Stop after 5 minutes
)

Error Handling

import asyncio
from autogen_agentchat.base import TaskResult

async def safe_team_run(team, task: str, max_retries: int = 2) -> TaskResult:
"""Run a team with retry logic for transient failures."""
for attempt in range(max_retries + 1):
try:
result = await team.run(task=task)
return result
except Exception as e:
if attempt == max_retries:
raise
wait_time = 2 ** attempt # Exponential backoff
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time}s...")
await asyncio.sleep(wait_time)

Observability

from autogen_agentchat.base import TaskResult

async def traced_team_run(team, task: str) -> TaskResult:
"""Run with comprehensive logging."""
import logging, time
logger = logging.getLogger(__name__)
start = time.time()

result = await team.run(task=task)
elapsed = time.time() - start

logger.info({
"event": "team_run_complete",
"task": task[:100],
"message_count": len(result.messages),
"stop_reason": result.stop_reason,
"elapsed_seconds": round(elapsed, 2),
"agents": [m.source for m in result.messages]
})

return result

:::danger v0.2 and v0.4 Are Incompatible

AutoGen v0.2 and v0.4 cannot coexist in the same Python environment. They are distributed as different packages (autogen vs autogen-agentchat) but share module names in ways that cause import conflicts. If you are upgrading a production v0.2 system, do the migration in a separate environment, test thoroughly, and treat it as a new deployment rather than an upgrade.

Do not try to run v0.2 and v0.4 code in the same process or environment. The chaos is not worth it.

:::

:::warning Code Execution Security

AutoGen's code execution feature runs arbitrary code generated by an LLM. Even with Docker sandboxing, this is a significant security surface. In production:

  • Always use Docker (not local execution) for code executor agents
  • Network-isolate the Docker container - it should not have internet access unless required
  • Set strict resource limits (memory, CPU, disk) on the container
  • Validate the code before execution if your threat model requires it
  • Log all executed code for audit purposes

Never use LocalCommandLineCodeExecutor in production with user-provided inputs. A malicious user who can influence the task description can influence the generated code.

:::


Interview Questions and Answers

Q1: What is AutoGen's fundamental architectural model, and how does it differ from frameworks like LangGraph or CrewAI?

AutoGen's model is conversation-centric. Agents are conversational participants - they send and receive messages. The fundamental unit of work is a message exchange between agents. Complex multi-agent systems are built by structuring which agents talk to whom, in what order, with what constraints.

LangGraph is graph-centric. The fundamental unit is a typed state and a node that transforms it. Routing is explicit as conditional edges in a directed graph. State is first-class; conversation is just one possible state field.

CrewAI is role-centric. Agents have roles and goals; tasks have expected outputs; crews orchestrate sequential or hierarchical task execution. The mental model is a human team with defined responsibilities.

AutoGen excels when conversation structure IS the solution - debate, critique, negotiation, collaborative code review. LangGraph excels when complex state routing is the challenge. CrewAI excels when you have a well-defined task pipeline with clear role boundaries.

Q2: Explain the three team types in AutoGen v0.4 and when you would use each.

RoundRobinGroupChat cycles through participants in a fixed order. Use for: structured debates where each perspective should have equal airtime, iterative refinement where an author and reviewer take alternating turns, or simple two-agent interactions where predictable turn order matters.

SelectorGroupChat uses an LLM to choose the next speaker based on the conversation state. Use for: expert panel simulations where the most relevant expert should speak next, or problem-solving sessions where the right next step determines who should speak.

Swarm enables agents to hand off control explicitly to other agents. Use for: support routing systems (triage → specialist → escalation), pipeline systems where each stage is handled by a different agent, or workflows where the agent itself determines who should handle the next step.

Q3: How does AutoGen's code execution work, and what are the security considerations?

When an AssistantAgent includes a code block (triple-backtick marked with a language) in its message, the CodeExecutorAgent extracts and executes that code. The output (stdout, stderr, exit code) is returned as a new message to the conversation, which the assistant agent reads and responds to.

Two executor options: LocalCommandLineCodeExecutor runs code in the same process environment - fast but unsafe for production. DockerCommandLineCodeExecutor runs code in an isolated Docker container - slower but sandboxed. The Docker executor supports resource limits, network isolation, and filesystem restrictions.

Security requirements for production: always use Docker execution, isolate the container's network (no internet by default), set memory and CPU limits, set a timeout (30-120 seconds depending on task), log all executed code, and validate that the task source is trusted. Never run user-supplied code even with Docker - prompt injection can cause the assistant to generate malicious code.

Q4: What termination conditions should every production AutoGen team have, and why?

Every production AutoGen team should have at minimum two termination conditions combined with | (OR):

  1. MaxMessageTermination - hard cap on message count. Prevents runaway conversations that loop indefinitely. Set this based on your expected conversation length times 2-3 for headroom. A team expected to reach consensus in 10 turns should have a max of 20-25.

  2. TimeoutTermination - wall clock time limit. Prevents teams from running forever if agents keep generating unhelpful responses. Combine with your infrastructure timeout (load balancer, Lambda, etc.) to ensure clean termination.

Optional but valuable:

  • TextMentionTermination for natural termination when an agent signals completion
  • TokenUsageTermination for cost control

Never deploy a production AutoGen team without a MaxMessageTermination. An agent that misunderstands its goal can generate hundreds of messages before hitting a context limit, burning significant API budget.

Q5: Compare AutoGen's Swarm to CrewAI's hierarchical process. When would you choose one over the other?

Both enable dynamic routing between agents based on context. The key difference is where routing decisions are made.

In CrewAI's hierarchical process, the routing decision is made by a manager LLM that reads the current task and decides which agent to assign it to. The manager is a separate agent with explicit authority.

In AutoGen's Swarm, the routing decision is made by the currently active agent - it decides to hand off to another agent via a HandoffMessage. There is no separate manager; each agent is responsible for recognizing when it is out of its domain and handing off.

Choose Swarm when: the current agent is the best judge of when to hand off (because it has the context), the routing logic is embedded in domain expertise (a billing agent knows when an issue requires technical support), or you want each agent to be fully autonomous in its routing decisions.

Choose CrewAI hierarchical when: you want a centralized routing authority that can see all agents and all tasks, you need the manager to quality-check each output before routing, or you want to be able to swap routing logic without changing individual agents.

In practice, Swarm is lower overhead and more natural for service routing pipelines. CrewAI hierarchical is more appropriate for complex task orchestration where quality management across stages is important.

© 2026 EngineersOfAI. All rights reserved.