What is multi-agent systems?

OpenAI's experimental multi-agent framework: agents, handoffs, context variables, and the triage pattern. What it gets right and wrong.

How does agent orchestration work in practice?

OpenAI Swarm covers multi-agent systems, agent orchestration, LangGraph multi-agent from first principles with code examples. Free lesson at https://engineersofai.com/docs/agentic-ai/multi-agent-systems/openai-swarm

What is the difference between multi-agent systems and LangGraph multi-agent?

See the full breakdown at https://engineersofai.com/docs/agentic-ai/multi-agent-systems/openai-swarm

OpenAI Swarm

Reading time: ~32 minutes | Relevance: Foundational for understanding multi-agent handoffs | Target roles: AI Engineer, ML Engineer

The Scenario

You're building a customer support system. A user contacts support with a billing question. The billing agent handles it - but partway through discovers it's actually a technical issue. In a traditional system, this requires the user to re-explain everything to a new agent. The context is lost. The user is frustrated.

OpenAI Swarm introduces the handoff: Agent A passes control to Agent B mid-conversation, transferring all context. The user experiences continuity. Internally, a specialist agent took over - invisibly.

This is Swarm's central idea, and it's elegant. Released in October 2024 as an experimental, educational framework, Swarm is not production-ready. But its simplicity reveals the core ideas of multi-agent coordination in the clearest form of any existing framework.

:::tip 🎮 Interactive Playground Visualize this concept: Try the Multi-Agent Systems demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

OpenAI released Swarm specifically to make multi-agent patterns understandable. The framework is intentionally minimal:

No complex abstractions
No workflow engines
No state machines
Just agents with instructions, tools, and the ability to hand off control

The README explicitly says: "Swarm is experimental and intended for educational purposes." OpenAI's production multi-agent solution is the Assistants API with handoffs support. Swarm is for learning the underlying concepts.

Despite being educational, Swarm's ideas directly influenced production frameworks. The handoff primitive - clean transfer of control between agents - appears in modified form in AutoGen, CrewAI, and LangGraph.

Core Swarm Concepts

Agents

An agent in Swarm has three things:

Instructions - the system prompt defining the agent's role and behavior
Tools - functions the agent can call
Handoffs - other agents it can transfer control to (these are implemented as tools)

from swarm import Swarm, Agent

client = Swarm()

billing_agent = Agent(
    name="Billing Agent",
    instructions=(
        "You handle billing questions: charges, refunds, invoices, payment methods. "
        "If the user's issue is technical (app bugs, login issues), transfer to the Technical Agent. "
        "If the issue needs human review, transfer to the Human Review Agent."
    ),
    # tools and handoffs added below
)

The Handoff Primitive

A handoff is a special tool call that returns an Agent object. When the LLM calls a handoff function, Swarm transfers control to the returned agent - the conversation continues with that agent's instructions and tools active.

def transfer_to_technical():
    """Transfer to Technical Agent for technical issues."""
    return technical_agent  # returns the Agent object

def transfer_to_billing():
    """Transfer to Billing Agent for billing questions."""
    return billing_agent

# Agents declare their handoff capabilities
triage_agent = Agent(
    name="Triage Agent",
    instructions="Route the user to the right specialist agent.",
    functions=[transfer_to_billing, transfer_to_technical]
)

The elegance: the LLM decides when to hand off based on its instructions and the conversation. No external routing logic needed.

Context Variables

Shared state that persists across agent handoffs. Any agent can read from and write to the context:

def get_user_account(context_variables: dict) -> str:
    """Look up user account. Uses context_variables for user_id."""
    user_id = context_variables.get("user_id")
    # In real use: look up in database
    return f"Account for user {user_id}: Premium plan, active since 2023"

def update_ticket_status(context_variables: dict, status: str) -> str:
    """Update support ticket status. Returns confirmation."""
    ticket_id = context_variables.get("ticket_id")
    return f"Ticket {ticket_id} status updated to: {status}"

# Context is passed at the start and updated as agents run
context = {
    "user_id": "usr_12345",
    "ticket_id": "TKT_789",
    "plan": "premium"
}

Context variables solve the state-sharing problem for multi-agent systems: each agent can access shared state without needing to pass it explicitly through every message.

Full Python Code: Swarm Customer Support System

The Swarm library requires pip install git+https://github.com/openai/swarm.git and the OpenAI API. Below is the core pattern - I'll also show an equivalent implementation using Anthropic's SDK that captures the same ideas:

"""
swarm_support_system.py

A 4-agent customer support system demonstrating:
- Triage routing
- Agent handoffs
- Context variables
- Specialized agents

Implements the Swarm pattern using Anthropic SDK directly
(no Swarm library dependency - pure concept demonstration).
"""

import json
import anthropic
from dataclasses import dataclass, field
from typing import Optional, Callable

client = anthropic.Anthropic()
MODEL = "claude-opus-4-5"


# ─── Agent Definition ─────────────────────────────────────────────────────────

@dataclass
class SwarmAgent:
    """
    Swarm-style agent: instructions + tools + handoff capabilities.
    Handoffs are implemented as special tool functions that return agent names.
    """
    name: str
    instructions: str
    tools: list[dict] = field(default_factory=list)
    tool_fns: dict[str, Callable] = field(default_factory=dict)
    can_handoff_to: list[str] = field(default_factory=list)


@dataclass
class SwarmMessage:
    role: str
    content: str
    agent_name: Optional[str] = None


@dataclass
class SwarmContext:
    """Shared state that persists across agent handoffs."""
    user_id: str
    ticket_id: str
    plan: str = "free"
    history: list[dict] = field(default_factory=list)
    resolved: bool = False
    handoff_count: int = 0


# ─── Tool Implementations ─────────────────────────────────────────────────────

def get_account_info(context: SwarmContext) -> str:
    return (
        f"Account: {context.user_id} | Plan: {context.plan} | "
        f"Ticket: {context.ticket_id} | Active: Yes"
    )

def process_refund(context: SwarmContext, amount: float, reason: str) -> str:
    return f"Refund of ${amount:.2f} approved for '{reason}'. Processing in 3-5 business days."

def check_system_status(context: SwarmContext, component: str) -> str:
    statuses = {"api": "operational", "login": "degraded", "payments": "operational"}
    return f"{component}: {statuses.get(component.lower(), 'unknown')}"

def create_support_ticket(context: SwarmContext, issue: str, priority: str = "medium") -> str:
    return f"Ticket {context.ticket_id} updated: '{issue}' (priority: {priority}). Engineering notified."

def mark_resolved(context: SwarmContext) -> str:
    context.resolved = True
    return "Ticket marked as resolved. User will receive confirmation email."


# ─── Agent Registry ───────────────────────────────────────────────────────────

def build_agents(context: SwarmContext) -> dict[str, SwarmAgent]:
    """Build all agents with their tools and handoff capabilities."""

    # ── Triage Agent ──────────────────────────────────────────────
    triage = SwarmAgent(
        name="Triage Agent",
        instructions=(
            "You are the first point of contact for customer support. "
            "Assess the user's issue and route to the right specialist:\n"
            "- Billing issues (charges, refunds, invoices) → Billing Agent\n"
            "- Technical issues (bugs, login, performance) → Technical Agent\n"
            "- Complex issues needing human review → Escalation Agent\n"
            "Get a brief description of the issue, then route immediately."
        ),
        tools=[
            {
                "name": "handoff_to_billing",
                "description": "Transfer to Billing Agent for billing/payment issues",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            },
            {
                "name": "handoff_to_technical",
                "description": "Transfer to Technical Agent for technical/product issues",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            },
            {
                "name": "handoff_to_escalation",
                "description": "Transfer to Escalation Agent for complex/sensitive issues",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            }
        ],
        tool_fns={
            "handoff_to_billing": lambda: "Billing Agent",
            "handoff_to_technical": lambda: "Technical Agent",
            "handoff_to_escalation": lambda: "Escalation Agent"
        },
        can_handoff_to=["Billing Agent", "Technical Agent", "Escalation Agent"]
    )

    # ── Billing Agent ─────────────────────────────────────────────
    billing = SwarmAgent(
        name="Billing Agent",
        instructions=(
            "You handle billing questions: charges, refunds, plan changes, invoices. "
            "Always check account info first. You can process refunds up to $500. "
            "For technical issues that arise, transfer to Technical Agent. "
            "Resolve simple billing questions directly without escalating."
        ),
        tools=[
            {
                "name": "get_account_info",
                "description": "Get user account details and billing status",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            },
            {
                "name": "process_refund",
                "description": "Process a refund for the user",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "amount": {"type": "number", "description": "Refund amount in dollars"},
                        "reason": {"type": "string", "description": "Reason for refund"}
                    },
                    "required": ["amount", "reason"]
                }
            },
            {
                "name": "mark_resolved",
                "description": "Mark the ticket as resolved",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            },
            {
                "name": "handoff_to_technical",
                "description": "Transfer to Technical Agent if a technical issue is discovered",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            }
        ],
        tool_fns={
            "get_account_info": lambda: get_account_info(context),
            "process_refund": lambda amount, reason: process_refund(context, amount, reason),
            "mark_resolved": lambda: mark_resolved(context),
            "handoff_to_technical": lambda: "Technical Agent"
        }
    )

    # ── Technical Agent ───────────────────────────────────────────
    technical = SwarmAgent(
        name="Technical Agent",
        instructions=(
            "You handle technical issues: bugs, login problems, performance, API issues. "
            "Check system status for relevant components. Create tickets for bugs. "
            "For billing questions that arise, transfer to Billing Agent. "
            "For issues needing engineering review, escalate."
        ),
        tools=[
            {
                "name": "check_system_status",
                "description": "Check operational status of a system component",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "component": {"type": "string", "description": "Component name (api, login, payments)"}
                    },
                    "required": ["component"]
                }
            },
            {
                "name": "create_support_ticket",
                "description": "Create/update a technical support ticket",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "issue": {"type": "string"},
                        "priority": {"type": "string", "enum": ["low", "medium", "high", "critical"]}
                    },
                    "required": ["issue"]
                }
            },
            {
                "name": "mark_resolved",
                "description": "Mark the ticket as resolved",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            },
            {
                "name": "handoff_to_escalation",
                "description": "Transfer to Escalation Agent for complex issues",
                "input_schema": {"type": "object", "properties": {}, "required": []}
            }
        ],
        tool_fns={
            "check_system_status": lambda component: check_system_status(context, component),
            "create_support_ticket": lambda issue, priority="medium": create_support_ticket(context, issue, priority),
            "mark_resolved": lambda: mark_resolved(context),
            "handoff_to_escalation": lambda: "Escalation Agent"
        }
    )

    # ── Escalation Agent ──────────────────────────────────────────
    escalation = SwarmAgent(
        name="Escalation Agent",
        instructions=(
            "You handle escalated issues requiring human judgment or executive action. "
            "Always create a detailed ticket. Set priority appropriately. "
            "Promise the user a response within 24 hours from a human agent. "
            "You do not resolve issues yourself - you ensure proper human follow-up."
        ),
        tools=[
            {
                "name": "create_support_ticket",
                "description": "Create a high-priority escalation ticket",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "issue": {"type": "string"},
                        "priority": {"type": "string", "enum": ["high", "critical"]}
                    },
                    "required": ["issue", "priority"]
                }
            }
        ],
        tool_fns={
            "create_support_ticket": lambda issue, priority="high": create_support_ticket(context, issue, priority)
        }
    )

    return {
        "Triage Agent": triage,
        "Billing Agent": billing,
        "Technical Agent": technical,
        "Escalation Agent": escalation
    }


# ─── Swarm Runtime ────────────────────────────────────────────────────────────

class SwarmRuntime:
    """Manages agent execution and handoffs."""

    def __init__(self, agents: dict[str, SwarmAgent], context: SwarmContext):
        self._agents = agents
        self._context = context
        self._message_history: list[dict] = []
        self._current_agent_name: str = "Triage Agent"
        self._handoff_log: list[str] = []

    def _execute_tool(self, agent: SwarmAgent, tool_name: str, tool_input: dict) -> tuple[str, Optional[str]]:
        """
        Execute a tool. Returns (result_string, new_agent_name_or_None).
        If result is an agent name, it's a handoff.
        """
        fn = agent.tool_fns.get(tool_name)
        if not fn:
            return f"Unknown tool: {tool_name}", None

        result = fn(**tool_input)

        # Check if result is a handoff (agent name)
        if isinstance(result, str) and result in self._agents:
            self._context.handoff_count += 1
            self._handoff_log.append(
                f"[{agent.name}] → [{result}] (handoff #{self._context.handoff_count})"
            )
            return f"Transferring you to {result} who can better help with this issue.", result

        return str(result) if result is not None else "Done.", None

    def run_turn(self, user_message: str) -> str:
        """Process one user message, potentially with handoffs."""
        self._message_history.append({"role": "user", "content": user_message})

        max_agent_switches = 5  # prevent infinite loops
        switches = 0

        while switches < max_agent_switches:
            agent = self._agents[self._current_agent_name]
            print(f"\n[Swarm] Active agent: {agent.name}")

            response = client.messages.create(
                model=MODEL,
                max_tokens=600,
                system=f"You are {agent.name}.\n\n{agent.instructions}",
                tools=agent.tools if agent.tools else [],
                messages=self._message_history
            )

            # Handle tool use
            if response.stop_reason == "tool_use":
                new_agent_name = None
                tool_results = []

                for block in response.content:
                    if block.type == "tool_use":
                        tool_result, handoff_target = self._execute_tool(
                            agent, block.name, block.input
                        )
                        print(f"  [Tool] {block.name}({block.input}) → {tool_result[:60]}")

                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": tool_result
                        })

                        if handoff_target:
                            new_agent_name = handoff_target

                # Add assistant + tool results to history
                self._message_history.append({"role": "assistant", "content": response.content})
                self._message_history.append({"role": "user", "content": tool_results})

                if new_agent_name and new_agent_name != self._current_agent_name:
                    print(f"  [Handoff] {self._current_agent_name} → {new_agent_name}")
                    self._current_agent_name = new_agent_name
                    switches += 1
                    continue

            else:
                # Text response - final answer
                final_text = ""
                for block in response.content:
                    if hasattr(block, "text"):
                        final_text += block.text

                self._message_history.append({"role": "assistant", "content": final_text})
                return final_text

        return "I'm sorry, I wasn't able to resolve your issue in this session."

    def print_handoff_log(self):
        if self._handoff_log:
            print("\n[Handoff Log]")
            for entry in self._handoff_log:
                print(f"  {entry}")


# ─── Usage ────────────────────────────────────────────────────────────────────

def main():
    context = SwarmContext(
        user_id="usr_88421",
        ticket_id="TKT_20250115",
        plan="premium"
    )
    agents = build_agents(context)
    runtime = SwarmRuntime(agents, context)

    print("[Support System] Starting conversation...")
    print(f"User: I was charged twice for my premium subscription last month. I also noticed the login page is broken.")

    # This message triggers: Triage → Billing (refund) → Technical (login issue)
    response = runtime.run_turn(
        "I was charged twice for my premium subscription last month. "
        "Also, the login page keeps giving me a 500 error."
    )

    print(f"\n[Agent Response]: {response}")
    runtime.print_handoff_log()
    print(f"\n[Context] Resolved: {context.resolved}, Handoffs: {context.handoff_count}")


if __name__ == "__main__":
    main()

Swarm Handoff Flow

Swarm vs Production Frameworks

Feature	Swarm	AutoGen	CrewAI	LangGraph
Handoffs	Native	Via messages	Via tasks	Via graph edges
Context vars	Shared dict	Conversation history	Task outputs	State schema
Persistence	None	None (basic)	None (basic)	Checkpointers
Human-in-loop	No	Yes	Limited	Yes (native)
Streaming	No	No	No	Yes
Production ready	No	Moderate	Yes	Yes
Learning curve	Minimal	Low	Low	High

What Swarm gets right: The handoff primitive is genuinely elegant. Context variables are practical. The framework is small enough to read in an hour.

What Swarm gets wrong: No persistence. No streaming. No human-in-loop. No observability. OpenAI-only (doesn't work with other LLMs). Not actively maintained beyond educational use.

Implementing Swarm Patterns Without the Library

The core Swarm pattern - agent with instructions + tools + handoffs as special tool returns - doesn't require the library. The implementation above demonstrates this. Any framework where tools can return agent references can implement Swarm-style handoffs.

Key idea to extract: handoffs are just tool calls that return control to a different agent. This is implementable in any multi-agent system:

# The core handoff pattern in any framework
def handoff_tool_result(target_agent_name: str) -> dict:
    """Signal that control should transfer to another agent."""
    return {
        "type": "handoff",
        "target": target_agent_name,
        "reason": "Routing to specialist"
    }

def process_tool_result(result: dict, agent_registry: dict) -> Optional[str]:
    """Check if tool result is a handoff."""
    if isinstance(result, dict) and result.get("type") == "handoff":
        return result["target"]
    return None

Production Notes

Swarm is educational, not production-ready: Use it to understand the handoff concept. For production customer support, use LangGraph (with full state persistence and human-in-loop) or CrewAI (with hierarchical process).

Handoff cycles are a real risk: Agent A hands off to Agent B, which hands off back to Agent A. Implement maximum handoff depth (the code above uses max_agent_switches = 5).

Context variable schema discipline: If multiple agents write to context variables, define the schema strictly. Agents writing different key names for the same concept leads to lost information.

:::warning Swarm is Not Production Ready OpenAI explicitly states Swarm is experimental and educational. It has no persistence (conversation lost on restart), no streaming, no built-in observability, and is only compatible with OpenAI models. Do not use Swarm for production systems. Use the concepts and implement them in a production framework. :::

Interview Q&A

Q: What is the handoff primitive in OpenAI Swarm and why is it valuable?

A: A handoff is a tool function that, when called by an LLM, returns an Agent object (or agent reference) rather than a string result. The runtime treats this as a signal to switch which agent is handling the conversation, transferring full context. It's valuable because it allows clean specialization - a triage agent routes to a billing agent mid-conversation without losing context, and the user experiences continuity.

Q: What are context variables in Swarm and what problem do they solve?

A: Context variables are a shared mutable dictionary passed to every agent in a Swarm session. Any agent can read from or write to it. They solve the state-sharing problem: when Agent A hands off to Agent B, Agent B needs information Agent A already gathered (user ID, account status, issue description). Rather than passing this through messages, it's stored in context and automatically available to all agents.

Q: Why would you implement Swarm patterns without the Swarm library?

A: Swarm only works with OpenAI models, has no production features (no persistence, streaming, observability), and is no longer actively developed. But the patterns - handoffs as special tool returns, context variables, triage routing - are worth implementing in your own stack. Using Anthropic's SDK + your own routing logic, you get the same patterns with your choice of model and full control over production concerns.

Q: How do you prevent handoff loops in Swarm-style systems?

A: Hard limit on handoff depth (the max_agent_switches counter). Track visited agents in the current session - if Agent A has already handled this conversation and Agent B wants to hand back to A, redirect to Escalation instead. Design agent instructions to minimize unnecessary handoffs: agents should try to resolve issues themselves before handing off.

The Scenario​

Why This Exists​

Core Swarm Concepts​

Agents​

The Handoff Primitive​

Context Variables​

Full Python Code: Swarm Customer Support System​

Swarm Handoff Flow​

Swarm vs Production Frameworks​

Implementing Swarm Patterns Without the Library​

Production Notes​

Interview Q&A​