AI Letters #24 - ReAct Agents: Six Lines vs Nineteen (And What You Lose in Between)

April 12, 2026 · 8 min read

AI Engineering Education

"Six lines to build a working ReAct agent sounds like a win. It is - until your agent starts looping and you have no idea why."

The ReAct loop is the first pattern every engineer reaches for when they need an agent. Thought, Action, Observation. Repeat until done. It's elegant on paper. In production it breaks in exactly the ways you'd expect: infinite loops, wrong tool selection, hallucinated tool calls that return nothing useful.

The question isn't whether ReAct agents work. It's whether your framework lets you see inside the loop when things go wrong.

Notebook #15 of the LLM Showdown measured three things: lines of code to build a working ReAct agent with two tools, the built-in tool inventory available without writing any tool code, and loop control parameters exposed to the caller. SynapseKit wins on LoC. LangChain wins on observability. LlamaIndex sits in the middle on both. The numbers are not the story. The tradeoff they reveal is.

Interactive Chart

ReAct Adoption Timeline →

From the 2022 Princeton paper to three competing framework implementations. Click each milestone to see what each framework prioritized and what it traded away.

Interactive Explorer

ReAct Loop Explorer →

Select a framework and step through Thought → Action → Observation to see exactly what each exposes at each step. Includes live code samples for all three.

Evidence Dashboard

Full Benchmark Results →

LoC stacked charts, built-in tool inventory, loop control heatmap, and custom tool cost - all benchmark data from notebook #15 in one view.

What ReAct Actually Requires

A minimal working ReAct agent needs four things: an LLM, at least one tool with a schema, a prompt that formats Thought/Action/Observation, and a loop that parses the model's output and dispatches tool calls. Getting all four wired together is where the frameworks diverge.

The benchmark task was identical across all three: define a calculator tool and a datetime tool, build a ReAct agent, run one query that requires at least one tool call.

The Evidence

Lines of code - imports + setup to a working agent:

Framework       Imports  Functional   Total
--------------------------------------------
SynapseKit            3           3       6
LlamaIndex            3          10      13
LangChain             5          14      19

SynapseKit gets to 6 lines because CalculatorTool and DateTimeTool are shipped in the library. You import them like any other class. There is no tool-definition code because there is nothing to define.

LangChain's 19 lines include two @tool-decorated functions - that's 10 lines of the gap right there. Strip those and LangChain's agent setup is 9 lines. The decorator approach is not verbose; it's complete. The tool code is what you'd write in any framework.

LlamaIndex at 13 lines uses FunctionTool.from_defaults() - plain Python functions wrapped into tool objects. Slightly more explicit than LangChain's decorator, slightly less so than SynapseKit's class hierarchy.

Custom tool definition - what it costs when built-ins don't cover your use case:

SynapseKit    6 lines  (subclass BaseTool, implement async run())
LangChain     5 lines  (@tool decorator on any annotated function)
LlamaIndex    5 lines  (plain function + FunctionTool.from_defaults())

SynapseKit's advantage evaporates here. The moment you need a tool that isn't in their library, you're writing more code than the alternatives, not less. The subclass pattern is also more rigid - you're tied to their async interface, their error handling convention, their schema format.

Built-in tool inventory (no tool code required):

Framework        Built-in tools
--------------------------------
SynapseKit                   18
LangChain                    15
LlamaIndex                    9

SynapseKit leads: web scraping, arxiv, PubMed, SQL, shell, Python REPL, translation, sentiment - all importable. LangChain has 15 but many require third-party API keys (Tavily, Brave, Google). LlamaIndex's 9 are mostly retrieval-oriented, which makes sense given its RAG-first heritage.

Loop control parameters exposed to the caller:

Parameter                SynapseKit  LangChain  LlamaIndex
-----------------------------------------------------------
max_iterations           Yes         Yes        Yes
early stop               Yes         Yes        Yes
handle_parsing_error     Yes         Yes        Yes
verbose                  No          Yes        Yes
return_intermediate_steps No          Yes        Yes
async support            Yes         Yes        Yes

Score (out of 6):          4           6          6

This is the number that matters in production.

The Contrast

ReAct Loop - What You Can Observe

SynapseKit                    LangChain / LlamaIndex
──────────────────────        ──────────────────────────────
[Thought]                     [Thought]  <- verbose logs
     |                              |
[Action]                      [Action]   <- intermediate steps
     |                              |
[Observation]                 [Observation] <- response.sources
     |                              |
[Answer]                      [Answer]
  ^ opaque                      ^ full trace available

SynapseKit's loop runs. You get the final answer. What happened in between - which tools were called, in what order, with what arguments, what they returned - is not surfaced by default. There is no verbose=True. There is no return_intermediate_steps. If the agent gives you a wrong answer, your debugging path is: re-run with print statements you've injected manually, or read source code.

LangChain gives you return_intermediate_steps=True on AgentExecutor. Every thought, every tool call, every observation is accessible in the response object. LlamaIndex surfaces the same through response.sources. This is not a nice-to-have. It is the difference between an agent you can ship and an agent you can't explain.

What This Means for Engineers

The 6-line number is real but context-dependent. If your use case fits SynapseKit's 18 built-in tools, you genuinely write less code. If it doesn't, you write more.
Observability is not optional in production. The first time a ReAct agent gives a customer a wrong answer, you will need to reconstruct exactly what it thought and did. SynapseKit makes that hard by default.
LangChain's verbosity is load-bearing. return_intermediate_steps, verbose, handle_parsing_errors - these aren't academic features. They are the handles you grab during an incident.
LlamaIndex at 13 lines is the quiet winner. FunctionTool is clean. response.sources gives you the trace. The tool count (9 built-in) is lower, but the RAG-tool integration is first-class. If you're already using LlamaIndex for retrieval, adding agents costs almost nothing structurally.
The custom tool cost comparison exposes the real architecture. SynapseKit's BaseTool subclass is not burdensome at 6 lines - but it is a commitment. LangChain's @tool decorator composes with any Python function you already wrote. The closer your existing codebase is to plain Python, the more that matters.

The Thing Most People Miss

The benchmark measured the cost to build a ReAct agent. It didn't measure the cost to debug one. Debugging cost scales with agent complexity, agent usage, and how long the loop runs. A 6-line setup that produces an opaque loop will cost you more time over a quarter than a 19-line setup with full observability - assuming the agent actually runs in production. Most of them do, eventually.

The frameworks that win on setup lines tend to lose on debuggability. This is not a coincidence. It is the fundamental tradeoff in API design: the more you hide, the less you write. The more you expose, the more you can see.

Three Things Worth Doing This Week

Check your current agent setup for return_intermediate_steps or equivalent. If you can't reconstruct the last 10 agent traces from your logs, you don't have production observability yet.
Audit your tool definitions. If they are tightly coupled to a framework's base class, write one clean Python function that does the same thing. Keep framework-agnostic logic separate from framework integration.
Run notebook #15 yourself against your own framework of choice: github.com/engineersofai/llm-showdown. The task is simple enough to replicate in 20 minutes. The loop control gaps show up immediately.

The conciseness race is worth running. Just know what you're trading away when you win it.

Engineers of AI

Read more: www.engineersofai.com

If this was useful, forward it to one engineer who should be reading it.

What ReAct Actually Requires​

The Evidence​

The Contrast​

What This Means for Engineers​

The Thing Most People Miss​

Three Things Worth Doing This Week​

Want to Think Like an AI Architect?