AI Letters #31 - Graph Workflows: When Chains Break and DAGs Take Over

April 23, 2026 · 10 min read

AI Engineering Education

A linear chain handles most tasks. Research, generate, done. But production workflows branch. If the query is complex, run a deeper research step. If it is simple, take the fast path. If quality is insufficient, loop back. This requires a graph, not a chain. Notebook #23 of the LLM Showdown tests which frameworks ship graph primitives - and which force you to build infrastructure from scratch.

Interactive Chart

From Chains to Graphs: The Evolution of LLM Orchestration ->

How LLM orchestration evolved from simple prompt chains through LangChain's LCEL to full DAG runtimes with StateGraph. Click each milestone to see what unlocked at each stage.

Interactive Explorer

Graph Feature Explorer ->

Click through each graph feature - conditional edges, parallel branches, cycles, checkpointing, streaming, visualization - and see which frameworks support it natively.

Evidence Dashboard

Full Graph Workflow Evidence Dashboard ->

Lines of code, feature heatmap, API comparison, and code side-by-side - all benchmark data from notebook #23 in one interactive view.

"The difference between a framework with graph primitives and one without is the difference between declaring your workflow and implementing your workflow engine."

A chain is a sequence. Step 1 feeds step 2. Step 2 feeds step 3. No decisions. No branches. No loops. For a simple RAG pipeline - retrieve, augment, generate - a chain is all you need.

Then requirements arrive. Route complex queries to a deep research path and simple queries to a fast path. Retry if the answer confidence is below a threshold. Run web search and database lookup in parallel, then merge results. Pause for human approval before executing a tool call.

Each of these patterns requires a directed acyclic graph (or a cyclic one, for loops). You need nodes, edges, conditional routing, state that persists across steps, and an execution engine that handles branching and merging. The question is whether your framework ships this as a primitive or whether you build it yourself.

Notebook #23 builds the same conditional 3-node workflow in all three frameworks: a research node, a conditional router that branches to either a detailed or quick answer path, and terminal nodes. Same logic, same behavior, different APIs.

The results split cleanly into two tiers.

What We Measured

Each framework implements a conditional pipeline: research -> router -> (detailed answer OR quick answer). The router branches based on query length (a proxy for complexity). We measured four things.

Metric	What it captures
Lines of code	LoC to build the conditional 3-node graph
Feature coverage	7 graph capabilities: StateGraph, conditional edges, parallel branches, cycles, checkpointing, streaming, visualization
API clarity	How readable is the graph definition?
Native support	Does the framework ship graph primitives or require manual Python?

Frameworks: SynapseKit 1.4 (StateGraph), LangChain 1.2 + LangGraph (StateGraph), LlamaIndex Core 0.14 (manual routing)

The Numbers

Lines of code: Conditional 3-node graph

Framework      Imports   Code   Total
----------------------------------------
SynapseKit          1     19      20
LangChain           2     18      20
LlamaIndex          3     12      15

LlamaIndex has the fewest lines. But those 15 lines implement only the happy path - manual if/else routing with no state schema, no checkpointing, no streaming, no visualization. Fewer lines of application code, more lines of infrastructure you will write later.

SynapseKit and LangChain are identical at 20 lines each. The APIs are so similar that porting code from one to the other takes minutes.

The Feature Matrix

This is the real story.

Graph Feature Support (7 features):

Feature               SynapseKit  LangChain  LlamaIndex
---------------------------------------------------------
StateGraph primitive      Yes         Yes         No
Conditional edges         Yes         Yes         No
Parallel branches         Yes         Yes         No
Cycle / loop support      Yes         Yes         No
Built-in checkpointing    Yes         Yes         No
Stream graph events       Yes         Yes         No
Graph visualization       Yes         Yes         No
---------------------------------------------------------
Score                     7/7         7/7         0/7

SynapseKit: 7 out of 7. LangChain: 7 out of 7. LlamaIndex: 0 out of 7.

This is not a close race with a narrow winner. This is a binary split. Two frameworks ship a complete graph runtime. One framework ships nothing.

The API Comparison

The most surprising finding: SynapseKit and LangGraph have nearly identical APIs.

SynapseKit:
  graph = StateGraph(schema)
  graph.add_node('research', research_fn)
  graph.add_conditional_edge('research', router, mapping)
  graph.add_edge('detailed_answer', END)
  app = graph.compile()
  result = app.run_sync(initial_state)

LangGraph:
  graph = StateGraph(State)
  graph.add_node('research', research_fn)
  graph.add_conditional_edges('research', router, mapping)
  graph.add_edge('detailed_answer', END)
  app = graph.compile()
  result = app.invoke(initial_state)

The differences: add_conditional_edge (singular) vs add_conditional_edges (plural). run_sync vs invoke. TypedState(fields={...}) vs TypedDict. That is it. The graph definition pattern is identical.

LlamaIndex:
  research_result = research_fn(query)
  if len(query) > 20:
      result = detailed_fn(research_result)
  else:
      result = quick_fn(research_result)

No graph object. No state schema. No conditional edge declaration. Just Python control flow. This works for the simple case. But when you need to add checkpointing, streaming, parallel branches, or cycle detection, you are building a graph engine, not using one.

The One Meaningful Difference

Where SynapseKit and LangChain diverge is state definition.

LangGraph uses a plain TypedDict:

class State(TypedDict):
    query: str
    result: str

SynapseKit uses TypedState with explicit StateField declarations:

schema = TypedState(fields={
    'query':  StateField(default=''),
    'result': StateField(default=''),
})

For simple last-write-wins state, LangGraph's TypedDict is cleaner and more Pythonic. For parallel branches that merge state - where two nodes independently append to a shared list, for example - SynapseKit's StateField reducers handle the merge logic declaratively. You define how concurrent writes resolve instead of writing merge code.

If your workflows are linear with conditional branches, LangGraph's state model is simpler. If your workflows have parallel fan-out/fan-in patterns, SynapseKit's reducer model prevents merge bugs.

When You Need a Graph

Not every pipeline needs graph primitives. A simple retrieve-augment-generate chain is fine as a chain. Reach for a graph when:

When to use a graph workflow:

Pattern              Example
---------------------------------------------------------
Conditional routing  Route to different models by query
                     complexity or topic domain

Retry loops          Re-run generation if confidence < 0.8,
                     up to 3 times

Parallel branches    Web search + DB lookup simultaneously,
                     merge results before generation

Human-in-the-loop   Pause at review node, wait for
                     approval, resume or reject

Quality gates        Evaluate output against criteria,
                     loop back to improve if insufficient

Multi-step agents    Agent reasons, acts, observes, decides
                     whether to continue or terminate

If none of these patterns apply to your workflow, a chain is simpler, debuggable, and sufficient. Do not adopt graph complexity for linear pipelines.

What This Means for Engineers

SynapseKit and LangChain tie on graph workflows. Both ship a complete StateGraph primitive with 7/7 features. The APIs are nearly identical. If graph workflows are your primary concern, both frameworks are equivalent choices.
LlamaIndex has no graph primitive. Zero out of 7 features. If your workflow requires conditional routing, loops, or parallel branches, you will build the orchestration layer yourself. This is a significant gap for complex pipeline architectures.
LangGraph's TypedDict state is simpler for basic cases. Plain Python TypedDict with no special imports. For last-write-wins state, this is cleaner than SynapseKit's StateField approach.
SynapseKit's StateField reducers win for parallel merging. When two branches write to the same state key concurrently, reducers define how to merge. Without reducers, you write merge logic manually and hope you handle every edge case.
Fewer lines does not mean simpler. LlamaIndex's 15-line implementation has less code but also less capability. The missing 5 lines buy you state schemas, streaming, checkpointing, visualization, and cycle detection - things you will eventually build by hand.

The Thing Most People Miss

Graph workflows are not about replacing chains. They are about making conditional logic declarative instead of imperative.

You can build any graph workflow in raw Python. If/else for routing. While loops for retries. Threading for parallel branches. Dict for state. It works. But the moment you need to debug a failed run at 3am, you want to see the graph structure, replay from a checkpoint, stream events to a dashboard, and visualize where the execution went.

Raw Python gives you none of that. A graph primitive gives you all of it.

The engineer who reaches for a StateGraph is not the one who cannot write if/else statements. They are the one who has debugged enough production workflows to know that the execution infrastructure matters more than the business logic. The business logic is 15 lines. The observability, checkpointing, streaming, and error handling around it is 150 lines. A framework graph primitive absorbs those 150 lines so you write the 15.

SynapseKit and LangChain both understand this. LlamaIndex, for now, does not.

Week 4 continues: cost tracking, guardrails, MCP support, and the final scorecard. The graph benchmark gives both SynapseKit and LangChain a point. The cumulative race holds steady.

Three Things Worth Doing This Week

Audit your pipeline for hidden conditional logic. Search for if/else branches that route between different processing paths. Each one is a candidate for a graph node with a conditional edge. Declare the routing, do not embed it in procedural code.
Add checkpointing to any workflow that takes more than 30 seconds. If a 5-node pipeline fails at node 4, you should resume from node 3, not restart from node 1. Both SynapseKit and LangGraph ship checkpointers. Use them.
Visualize your graph before deploying it. Both SynapseKit (app.get_mermaid()) and LangGraph (app.get_graph().draw_mermaid()) export Mermaid diagrams. Generate the diagram, review the edges, confirm the routing logic matches your intent. A graph you can see is a graph you can debug.

The best workflow architecture is the one where adding a new branch takes one line, not a refactor. Graph primitives make that possible. Raw Python makes it a project.

Engineers of AI

Read more: www.engineersofai.com

If this was useful, forward it to one engineer who should be reading it.

What We Measured​

The Numbers​

The Feature Matrix​

The API Comparison​

The One Meaningful Difference​

When You Need a Graph​

What This Means for Engineers​

The Thing Most People Miss​

Three Things Worth Doing This Week​

Want to Think Like an AI Architect?