Skip to main content

AI Letters #26 - Multi-Agent Orchestration: 16 vs 19 vs 23 Lines (And Three Completely Different Mental Models)

· 9 min read
EngineersOfAI
AI Engineering Education

"Three frameworks, three different answers to the same question: who decides when one agent hands work to the next?"

A single agent with tools handles most tasks. But some workflows need specialisation - a researcher producing facts, a writer turning facts into prose, a reviewer checking the output. That chain of specialised agents is where the frameworks stop converging and start showing what they actually believe about software design.

Notebook #18 of the LLM Showdown measured the same 2-agent sequential pipeline across SynapseKit, LangChain (via LangGraph), and LlamaIndex. Researcher feeds Writer. Both call an LLM. The orchestrator wires them together. Simple enough that you can count the lines. Complex enough that the design philosophy underneath becomes visible.

The LoC numbers tell part of the story. The orchestration pattern matrix tells the rest.

What the Numbers Say

The benchmark task was identical across all three: wire a Researcher agent and a Writer agent in sequence. Researcher gets a topic, produces bullet points. Writer receives those bullet points, produces a paragraph.

Lines of code - imports + setup to a working 2-agent pipeline:

Framework Imports Functional Total
--------------------------------------------
SynapseKit 3 13 16
LlamaIndex 3 16 19
LangChain 4 19 23

SynapseKit wins on LoC. The gap between SynapseKit (16) and LangChain (23) looks large but read the next section before drawing conclusions.

Orchestration patterns supported:

Pattern SynapseKit LangChain LlamaIndex
---------------------------------------------------------
Sequential Yes Yes Yes
Parallel Yes Yes No
Supervisor Yes Yes No
Handoff chain Yes No Yes
Graph / DAG Yes Yes No
Shared state Yes Yes Yes

Score (out of 6): 6 5 3

SynapseKit and LangChain are nearly tied. LlamaIndex trails significantly - its AgentWorkflow supports sequential handoffs and shared state, but no parallel execution and no supervisor routing.

The Three Mental Models

This is the part that matters more than LoC.

Who controls the handoff?

SynapseKit LangChain (LangGraph) LlamaIndex
────────────────── ──────────────────── ──────────────────
Framework You (graph edges) The LLM

Task-centric: Graph-centric: Agent-centric:
define WHAT each define HOW data agents decide WHEN
agent should do flows between nodes to pass the baton

crew.run() app.invoke(state) workflow.run()
executes the executes the lets the LLM
task sequence graph improvise

SynapseKit is task-centric. You define what each agent should produce (expected_output) and what context it needs (context_from). The framework manages the sequencing. You don't write the routing logic - you declare the dependency graph and let the Crew executor handle it.

LangChain (LangGraph) is graph-centric. You define nodes (functions) and edges (transitions). The LLM is just a function inside a node - it has no special status. This means the orchestration logic is entirely under your control. Want to add a conditional branch that routes to a fact-checker if confidence is low? That's one add_conditional_edges call. Want to loop back to the researcher if the writer rejects the output? Same. LangGraph doesn't care what's inside each node.

LlamaIndex is agent-centric. Agents decide when to hand off via tool calls. The AgentWorkflow sets up which agents can hand to whom (can_handoff_to), then runs the root agent and lets the LLM drive. The orchestration is emergent - which means it's also less predictable. If the researcher agent decides not to call handoff_to_writer, the workflow stalls.

What the LoC Gap Actually Costs

LangChain's 23 lines include 4 lines of TypedDict state definition, 2 function definitions with LLM calls, and 6 lines of graph wiring. None of that is boilerplate you can skip in a real pipeline - the TypedDict is your contract between nodes, the functions are your agent logic, the graph wiring is your orchestration.

SynapseKit's 16 lines hide that complexity inside the framework. CrewAgent, Task, and Crew are opinionated abstractions. The question isn't whether the code is shorter - it is. The question is what you lose when the abstraction doesn't fit your use case.

Custom tool cost from the previous benchmark (#25): SynapseKit requires subclassing BaseTool. LangChain requires a decorator. If you're building a pipeline where the agents need tools the framework doesn't provide, that cost repeats for every tool.

The Parallel and Supervisor Gap

LlamaIndex's 3/6 pattern score is the number that should influence framework choice.

If your multi-agent system ever needs to run two agents simultaneously - a web-searcher and a database-queryer both working on different subtasks, then merging results - LlamaIndex requires you to build that yourself. AgentWorkflow executes agents in sequence via handoffs. There is no built-in parallel branch.

Supervisor routing is similar. If you need a routing agent that decides which specialist to call based on query type, you're writing that logic yourself on LlamaIndex. SynapseKit ships SupervisorAgent(llm, workers). LangChain gives you a supervisor node pattern in LangGraph.

For simple sequential pipelines, LlamaIndex's limitation doesn't matter. For anything with conditional branching, parallel execution, or dynamic routing, the 3/6 score is a constraint you'll hit.

What This Means for Engineers

  1. SynapseKit's Crew is the fastest path for linear pipelines. Researcher → Writer → Reviewer in sequence, with context passing? 16 lines, one crew.run() call. If that's the pattern, use it.

  2. LangGraph's graph-centric model is not verbosity - it's explicitness. Every edge in your multi-agent graph is a line of code you wrote. That means every routing decision is auditable, testable, and reproducible. When the pipeline behaves unexpectedly, you read the graph.

  3. LlamaIndex's emergent handoff is a bet on the LLM. The agent decides when to pass work to the next agent. That's elegant when it works. When the LLM misses the handoff signal or calls it at the wrong point in the task, you're debugging LLM behaviour rather than framework behaviour. Plan for it.

  4. Parallel execution is not a nice-to-have. Any pipeline that can decompose work across independent agents - and most real workflows can - benefits from parallel execution. The latency difference between sequential and parallel runs compounds as agent count grows.

  5. The custom tool cost from #25 still applies here. Multi-agent pipelines need agents with tools. The LoC advantage SynapseKit holds on agent setup shrinks once you're writing custom tools that don't fit their BaseTool subclass pattern.

The Thing Most People Miss

The LoC benchmarks consistently show SynapseKit winning on setup conciseness. This is real. It is also the least important property of a multi-agent system in production.

What matters in production:

  • Can you inspect the state between agents?
  • Can you replay a failed run from a specific node?
  • Can you test individual agents in isolation?
  • Can you add a conditional branch without rewriting the pipeline?

LangGraph answers all four yes. SynapseKit answers the first two partially - return_intermediate_steps isn't built into Crew the same way it is in AgentExecutor. LlamaIndex answers all four with varying difficulty.

The framework that wins the LoC race is the one you spend the least time setting up. The framework that wins the production race is the one you spend the least time debugging. Those are different frameworks, and the benchmark is measuring the wrong thing if you're building something that runs for more than a sprint.

Three Things Worth Doing This Week

  1. Map your current multi-agent system to the pattern matrix. Which of the six patterns does it actually use? If the answer is only "sequential" and "shared state", LlamaIndex's 3/6 is irrelevant to you.

  2. Build one conditional branch into an existing sequential pipeline. Take any two-step agent pipeline and add a condition: "if output confidence is low, loop back". That's where LangGraph's graph-centric model pays for its verbosity.

  3. Check whether your handoffs are deterministic. If your agents hand off via LLM tool calls (LlamaIndex model), run the same pipeline five times and check whether the handoff happens at the same point each time. If it doesn't, you have a reliability problem you may not have noticed yet.

The LoC race is over by the second week of production. The debuggability race never ends.

Engineers of AI

Read more: www.engineersofai.com

If this was useful, forward it to one engineer who should be reading it.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.