ReAct Agent Pattern
A Production Scenario
Your company has built a customer success agent. It can answer product questions, look up orders, and handle refund requests. In testing it performs brilliantly. Then you push it to production and the first complex support ticket arrives: a customer wants to know why their bulk order of 150 units was charged a different rate than last month's order of 120 units, whether the current pricing matches what the sales team quoted, and whether the difference qualifies for a credit under the loyalty program terms.
The agent tries to answer this in one shot. It generates a confident explanation about volume pricing tiers and loyalty credits - but it has never actually checked your pricing database, the sales quote system, or the loyalty program rules. Every number it cites comes from pattern matching on its training data, not your live systems. The customer gets a detailed, well-written, completely wrong response.
You add tool use - the agent can now call get_pricing_history(), get_order_details(), get_loyalty_terms(). But here is the new problem: the agent calls all three tools in one pass, gets back a pile of data, and tries to reason over it all at once. Sometimes it gets the right answer. Often it misinterprets the data, draws an incorrect conclusion, or ignores a relevant fact buried in the tool results.
The issue is that tool use alone does not make an agent smart. The agent is still trying to think everything through before it acts and act on everything before it thinks. What it needs is a tighter loop: think a little, act on that thought, observe what happened, think again. That is ReAct - and it changes everything about how the agent approaches this problem.
With ReAct, the agent first thinks "I need to understand the current pricing tiers." It calls get_pricing_history() and looks only at that result. Then it thinks "The tier boundary changed in November, which means last month and this month have different rates. Now I need to confirm the sales quote." It calls get_sales_quote(). Each thought is grounded in the previous observation. The reasoning is traceable. And when something goes wrong, you can read the trace and see exactly where the agent went off the rails.
Why This Exists
The Hallucination Problem: Ungrounded Reasoning
Chain-of-thought prompting (Wei et al., 2022) demonstrated that asking LLMs to show their reasoning dramatically improves their performance on complex tasks. If you prompt an LLM to "think step by step," it produces better answers. This was a genuine breakthrough.
But chain-of-thought has a fundamental failure mode: all the reasoning happens in the model's head. It can reason its way to a completely wrong answer without ever checking whether its premises are true. The model might chain together a beautiful logical argument based on a false initial assumption, and every subsequent step compounds the error.
The classic demonstration: ask an LLM "How many words are in the Wikipedia article on relativity?" and have it think step by step. It will reason confidently about this, produce a number, and be wrong - because it cannot count. The thinking is disconnected from the facts.
Why Tool Use Alone Is Not Enough
You might think the solution is just to add tool use: let the model call a search tool and get the Wikipedia article, then count. But this still fails on complex multi-step tasks, for a different reason.
When a model has tool use but no structured reasoning loop, it tends to plan everything upfront, execute a batch of tool calls, then try to synthesize the results. This front-loaded planning approach breaks on tasks where the right next action depends on what you discovered in the previous step. The agent cannot adapt its plan mid-execution because it already committed to the plan before executing anything.
What ReAct Solves
ReAct (Reasoning + Acting) by Yao et al. (2022) interleaves reasoning traces with tool actions in a tight loop. The model does not plan everything upfront. Instead, it takes one small step: it thinks about what it currently knows and what it needs to know, takes one action based on that thought, observes the result, and then thinks again.
This loop grounds every reasoning step in real observations. If the model's assumption was wrong, the tool result corrects it immediately. If a tool fails, the model can reason about the failure and try a different approach. The reasoning is not decoration - it is mechanistically connected to the actions and observations.
Historical Context and Key Papers
Chain-of-Thought Prompting (Wei et al., 2022, Google Brain) - showed that prompting models to reason step by step dramatically improves performance on math, logic, and commonsense tasks. The insight: the reasoning trace itself is useful, not just the final answer.
ReAct: Synergizing Reasoning and Acting (Yao et al., 2022, Princeton + Google Research) - the paper that defined the pattern. ReAct was evaluated on HotpotQA, FEVER, and interactive decision-making tasks (AlfWorld, WebShop). It outperformed chain-of-thought on tasks requiring external information and outperformed tool-only approaches on tasks requiring reasoning. The key insight: interleaving reasoning and acting is better than doing either alone.
Reflexion (Shinn et al., 2023) - extended ReAct with self-reflection: after a trajectory fails, the model reflects on what went wrong and revises its strategy before trying again. This is a form of episodic memory applied to agent learning.
LATS (Zhou et al., 2023) - Language Agent Tree Search, combining ReAct-style traces with Monte Carlo Tree Search. The agent explores multiple reasoning paths and uses a value function to guide search. More powerful but much more expensive than basic ReAct.
The ReAct Loop in Detail
Every iteration of the ReAct loop has exactly three components:
Thought: The agent's internal reasoning. What do I know? What do I need? What tool should I call? This is not shown to the user - it is the agent's scratchpad.
Action: The tool the agent decides to call, with its arguments. One tool call per iteration (classic ReAct).
Observation: The result of the tool call. This is what the real world reported back.
The loop repeats until the agent produces a Final Answer instead of an action.
Implementing ReAct From Scratch
The cleanest way to understand ReAct is to implement it from scratch, parsing the thought/action/observation structure from raw LLM output. This is the "old school" approach before native tool use APIs existed, and it reveals what every framework is doing under the hood.
import re
import anthropic
client = anthropic.Anthropic()
# ─── Tool Definitions ─────────────────────────────────────────────────────────
def wikipedia_search(query: str) -> str:
"""Search Wikipedia (stub)."""
results = {
"python programming": "Python is a high-level, interpreted programming language created by Guido van Rossum in 1991.",
"eiffel tower": "The Eiffel Tower is a wrought-iron lattice tower in Paris, 330 meters tall, completed in 1889.",
"large language model": "Large language models are neural networks trained on vast text corpora to predict and generate text.",
}
for key, value in results.items():
if key in query.lower():
return value
return f"No Wikipedia article found for: {query}"
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Calculation error: {e}"
TOOLS = {
"wikipedia": wikipedia_search,
"calculator": calculator,
}
TOOL_DESCRIPTIONS = """
You have access to the following tools:
wikipedia[query]: Search Wikipedia for information about a topic.
calculator[expression]: Evaluate a mathematical expression. Example: calculator[2 * (3 + 4)]
To use a tool, respond with:
Thought: [your reasoning about what to do]
Action: tool_name[argument]
After receiving an observation, continue with:
Thought: [reasoning about the observation]
Action: tool_name[argument]
When you have the final answer, respond with:
Thought: [I now have enough information]
Final Answer: [your answer to the original question]
"""
# ─── ReAct Parser ─────────────────────────────────────────────────────────────
def parse_react_output(text: str) -> tuple[str, str | None, str | None, str | None]:
"""
Parse a ReAct-format LLM output into components.
Returns: (thought, action_tool, action_input, final_answer)
"""
thought = ""
action_tool = None
action_input = None
final_answer = None
# Extract thought
thought_match = re.search(r"Thought:\s*(.+?)(?=\nAction:|\nFinal Answer:|$)", text, re.DOTALL)
if thought_match:
thought = thought_match.group(1).strip()
# Extract action (format: tool_name[input])
action_match = re.search(r"Action:\s*(\w+)\[(.+?)\]", text, re.DOTALL)
if action_match:
action_tool = action_match.group(1).strip()
action_input = action_match.group(2).strip()
# Extract final answer
final_match = re.search(r"Final Answer:\s*(.+?)$", text, re.DOTALL)
if final_match:
final_answer = final_match.group(1).strip()
return thought, action_tool, action_input, final_answer
# ─── ReAct Agent ──────────────────────────────────────────────────────────────
def react_agent_scratch(question: str, max_iterations: int = 8) -> str:
"""
Implement a ReAct agent from scratch using text parsing.
This is how it worked before native tool use APIs.
"""
system_prompt = f"""You are a helpful assistant that uses tools to answer questions accurately.
{TOOL_DESCRIPTIONS}
Important rules:
- Always use tools when you need external information
- Never make up facts - use tools to verify
- One tool call per step
- Think carefully about what information you need before calling a tool
"""
conversation = f"Question: {question}\n"
trajectory = [] # Store the full trace for observability
for iteration in range(max_iterations):
print(f"\n--- Iteration {iteration + 1} ---")
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": conversation}]
)
llm_output = response.content[0].text
print(f"LLM output:\n{llm_output}")
thought, action_tool, action_input, final_answer = parse_react_output(llm_output)
trajectory.append({
"iteration": iteration + 1,
"thought": thought,
"action_tool": action_tool,
"action_input": action_input,
})
if final_answer:
trajectory[-1]["final_answer"] = final_answer
print(f"\n=== Final Answer: {final_answer} ===")
print(f"\nFull trajectory: {trajectory}")
return final_answer
if action_tool and action_input:
# Execute the tool
if action_tool in TOOLS:
observation = TOOLS[action_tool](action_input)
else:
observation = f"Unknown tool: {action_tool}. Available: {list(TOOLS.keys())}"
print(f"Observation: {observation}")
trajectory[-1]["observation"] = observation
# Append to conversation for next iteration
conversation += f"{llm_output}\nObservation: {observation}\n"
else:
# Model didn't produce a valid action or final answer
observation = "Invalid format. Use: Action: tool_name[input] or Final Answer: [answer]"
conversation += f"{llm_output}\nObservation: {observation}\n"
return "Maximum iterations reached. Unable to complete the task."
# Test it
if __name__ == "__main__":
answer = react_agent_scratch(
"How tall is the Eiffel Tower in feet? (It's 330 meters tall)"
)
ReAct with Native Tool Use (Modern Approach)
The text-parsing approach above is educational but fragile. The modern approach uses the API's native tool use, where the model produces structured tool calls instead of text to parse. The logic is the same - thought, action, observation, repeat - but the mechanics are cleaner.
import anthropic
import json
from dataclasses import dataclass, field
client = anthropic.Anthropic()
@dataclass
class AgentStep:
"""One step in the ReAct trajectory."""
iteration: int
thought: str = ""
tool_name: str = ""
tool_input: dict = field(default_factory=dict)
observation: str = ""
is_final: bool = False
final_answer: str = ""
@dataclass
class AgentResult:
"""Complete result of an agent run."""
final_answer: str
trajectory: list[AgentStep]
total_iterations: int
success: bool
def react_agent(
question: str,
tools: list[dict],
tool_registry: dict,
max_iterations: int = 10,
system_prompt: str = "You are a helpful assistant. Use tools to find accurate information."
) -> AgentResult:
"""
Production-grade ReAct agent using native tool use API.
"""
messages = [{"role": "user", "content": question}]
trajectory = []
for iteration in range(1, max_iterations + 1):
step = AgentStep(iteration=iteration)
# ── Call the LLM ───────────────────────────────────────────────────
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
system=system_prompt,
tools=tools,
messages=messages
)
# Add assistant response to conversation
messages.append({"role": "assistant", "content": response.content})
# ── Extract thought (text blocks before tool calls) ────────────────
for block in response.content:
if hasattr(block, 'text') and block.text.strip():
step.thought = block.text.strip()
break
# ── Final Answer: no tool call ─────────────────────────────────────
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, 'text') and block.text.strip():
step.is_final = True
step.final_answer = block.text.strip()
trajectory.append(step)
return AgentResult(
final_answer=step.final_answer,
trajectory=trajectory,
total_iterations=iteration,
success=True
)
# ── Tool Call: execute tool and observe ────────────────────────────
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
step.tool_name = block.name
step.tool_input = block.input
print(f"[{iteration}] Thought: {step.thought[:80]}...")
print(f"[{iteration}] Action: {block.name}({block.input})")
# Execute the tool
if block.name in tool_registry:
try:
result = tool_registry[block.name](**block.input)
observation = json.dumps(result) if isinstance(result, dict) else str(result)
is_error = False
except Exception as e:
observation = json.dumps({"error": str(e)})
is_error = True
else:
observation = json.dumps({"error": f"Unknown tool: {block.name}"})
is_error = True
step.observation = observation
print(f"[{iteration}] Observation: {observation[:200]}")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": observation,
"is_error": is_error
})
trajectory.append(step)
# Inject observations back into conversation
messages.append({
"role": "user",
"content": tool_results
})
# Max iterations exceeded
return AgentResult(
final_answer="Unable to complete task within iteration limit.",
trajectory=trajectory,
total_iterations=max_iterations,
success=False
)
Stopping Conditions
A production ReAct agent needs clear rules for when to stop.
class StoppingCondition:
"""Configurable stopping conditions for a ReAct agent."""
def __init__(
self,
max_iterations: int = 10,
max_tool_calls: int = 20,
error_threshold: int = 3, # Stop if this many consecutive errors
):
self.max_iterations = max_iterations
self.max_tool_calls = max_tool_calls
self.error_threshold = error_threshold
self._total_tool_calls = 0
self._consecutive_errors = 0
def should_stop(self, trajectory: list[AgentStep]) -> tuple[bool, str]:
"""Returns (should_stop, reason)."""
if len(trajectory) >= self.max_iterations:
return True, f"Max iterations ({self.max_iterations}) reached"
if self._total_tool_calls >= self.max_tool_calls:
return True, f"Max tool calls ({self.max_tool_calls}) reached"
if self._consecutive_errors >= self.error_threshold:
return True, f"Too many consecutive errors ({self._consecutive_errors})"
# Detect infinite loops: same tool called with same args 3+ times
if len(trajectory) >= 3:
last_three = trajectory[-3:]
if (
len(set(s.tool_name for s in last_three)) == 1
and len(set(str(s.tool_input) for s in last_three)) == 1
):
return True, "Detected infinite loop: same tool called repeatedly"
return False, ""
def record_tool_call(self, is_error: bool):
self._total_tool_calls += 1
if is_error:
self._consecutive_errors += 1
else:
self._consecutive_errors = 0
ReAct vs Direct Prompting: When Does the Loop Help?
ReAct is not always better than a single-shot prompt. The loop adds latency and cost. Here is a decision framework:
| Scenario | Use ReAct? | Why |
|---|---|---|
| Simple factual question | No | One retrieval suffices; loop wastes time |
| Question requiring 2-3 tool calls | Maybe | ReAct helps if calls are sequential/dependent |
| Complex multi-step research | Yes | Each step's result shapes the next question |
| Math calculation | No | Call calculator directly; no reasoning loop needed |
| Debugging code (unknown cause) | Yes | Diagnosis must adapt to each observation |
| Filling out a form with known fields | No | Just call tools in order; no reasoning needed |
The threshold is roughly: if you can write a deterministic plan before seeing any tool results, use a pipeline. If the plan must adapt to what you discover, use ReAct.
Traces and Observability
One of ReAct's underappreciated benefits is observability. Every thought-action-observation triple is a debugging artifact. When an agent gives a wrong answer, you can read the trace and identify exactly where the reasoning broke down.
def format_trajectory(result: AgentResult) -> str:
"""Format a ReAct trajectory for logging or debugging."""
lines = [
f"=== Agent Trajectory ({'SUCCESS' if result.success else 'FAILED'}) ===",
f"Total iterations: {result.total_iterations}",
""
]
for step in result.trajectory:
lines.append(f"── Iteration {step.iteration} ──")
if step.thought:
lines.append(f" Thought: {step.thought}")
if step.tool_name:
lines.append(f" Action: {step.tool_name}({step.tool_input})")
lines.append(f" Obs: {step.observation[:300]}")
if step.is_final:
lines.append(f" FINAL: {step.final_answer}")
lines.append("")
return "\n".join(lines)
# In production, send trajectories to your observability stack
import logging
logger = logging.getLogger(__name__)
def run_with_tracing(question: str, tools, registry) -> AgentResult:
result = react_agent(question, tools, registry)
trace_text = format_trajectory(result)
if result.success:
logger.info("Agent succeeded", extra={
"iterations": result.total_iterations,
"trajectory": trace_text
})
else:
logger.warning("Agent failed", extra={
"question": question,
"trajectory": trace_text
})
return result
Failure Modes
Infinite Loops
The agent calls the same tool with the same arguments repeatedly. This happens when the model interprets the tool result as "I need more of the same information" rather than "the information is not available." Fix: detect repeated (tool, args) pairs in the stopping condition and break out.
Wrong Tool Selection
The model picks a plausible-sounding tool that is not the right one for the task. This is a description quality problem - improve the tool description to be more specific about when to use it and when not to.
Reasoning That Contradicts Observations
The model's next thought does not incorporate the observation it just received. This is a serious failure: the observation said X, but the model continues reasoning as if it said Y. It is often caused by observations that are too long - the model loses track of the key facts. Fix: summarize long observations before injecting them.
Premature Termination
The model produces a "Final Answer" before it has actually answered the question, often when it encounters a confusing tool result. Fix: validate the final answer against the original question with a separate model pass.
LangChain's AgentExecutor
LangChain provides AgentExecutor which implements the ReAct loop for you, with configurable stopping conditions, callbacks, and error handling.
from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
# Define tools using LangChain's @tool decorator
@tool
def search_web(query: str) -> str:
"""Search the web for current information about a topic."""
return f"Search results for '{query}': [stub result]"
@tool
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression. Example: calculator('2 * (3 + 4)')"""
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Error: {e}"
tools = [search_web, calculator]
# Create the model
llm = ChatAnthropic(model="claude-opus-4-6", temperature=0)
# Create the prompt template
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Use tools to find accurate information."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
# Create the agent
agent = create_tool_calling_agent(llm, tools, prompt)
# Create the executor with production settings
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Log every step
max_iterations=10, # Stop after 10 rounds
max_execution_time=60.0, # 60-second overall timeout
return_intermediate_steps=True, # Include trajectory in output
handle_parsing_errors=True, # Don't crash on malformed output
)
result = executor.invoke({
"input": "What is 15% of $127.40? Also search for the current Python version."
})
print(f"Answer: {result['output']}")
print(f"Steps taken: {len(result['intermediate_steps'])}")
Production Engineering Notes
Circuit Breakers
Wrap your ReAct agent in a circuit breaker that disables it when error rates spike. If the agent starts failing 80% of requests (often caused by a broken tool or a model regression), fall back to a simpler response.
class AgentCircuitBreaker:
def __init__(self, failure_threshold: float = 0.5, window_size: int = 20):
self.failure_threshold = failure_threshold
self.window_size = window_size
self._recent_results: list[bool] = []
self._open = False
def record(self, success: bool):
self._recent_results.append(success)
if len(self._recent_results) > self.window_size:
self._recent_results.pop(0)
failure_rate = 1 - (sum(self._recent_results) / len(self._recent_results))
if failure_rate > self.failure_threshold:
self._open = True
def is_open(self) -> bool:
return self._open
def reset(self):
self._open = False
self._recent_results = []
Fallback to Direct Response
When the agent fails or the circuit breaker opens, fall back to a direct LLM response without tools. This is worse but far better than an error.
def run_with_fallback(question: str, tools, registry) -> str:
cb = AgentCircuitBreaker()
if not cb.is_open():
try:
result = react_agent(question, tools, registry)
cb.record(result.success)
if result.success:
return result.final_answer
except Exception:
cb.record(False)
# Fallback: direct response without tools
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": question}],
system="Answer based on your training knowledge. Be clear if you are uncertain."
)
return response.content[0].text
Common Mistakes
:::danger Parsing ReAct Output With Fragile Regex If you implement ReAct with text parsing (the "from scratch" approach), do not use simple regex on raw model output. The model will deviate from your format. Use robust parsers with fallbacks, or better yet, use native tool use APIs which give you structured output. :::
:::danger No Maximum Iterations
A ReAct agent without a max iteration limit can loop indefinitely when a tool is broken or the model gets confused. This burns tokens, costs money, and blocks the request thread. Always set max_iterations.
:::
:::warning Showing All Tool Results Unfiltered Injecting large tool results directly into the conversation fills the context window fast. For any tool that might return more than ~500 tokens, implement truncation or summarization before injection. :::
:::warning Conflating Thought With Response In ReAct, the "Thought" is the model's internal scratchpad. Do not show raw thoughts to users - they often contain uncertain reasoning, dead ends, and partial information. Always extract only the "Final Answer" for the user-facing response. :::
Interview Q&A
Q: What is the core innovation of ReAct compared to chain-of-thought prompting?
Chain-of-thought prompting asks the model to reason before producing an answer, but all the reasoning happens inside the model with no external grounding. ReAct interleaves reasoning with actual tool calls - the model thinks, acts, observes the real result, then thinks again. This grounds each reasoning step in actual observations rather than the model's internal beliefs, which dramatically reduces hallucination on fact-intensive tasks.
Q: When would you choose ReAct over a fixed pipeline of tool calls?
Use a fixed pipeline when you know exactly what tools to call and in what order before the task starts - for example, "always retrieve context, then generate, then check citations." Use ReAct when the right sequence of tool calls depends on what you discover along the way - research tasks, debugging, open-ended question answering. ReAct's loop adapts to intermediate results; a pipeline cannot.
Q: How do you handle a ReAct agent that gets stuck in a loop?
Detect loops by tracking (tool_name, serialized_args) pairs across iterations. If the same pair appears three or more times, the agent is looping. Break out by: (1) injecting a "you are in a loop" message as a fake observation, (2) switching to a different tool, or (3) terminating with a graceful failure message. The stopping condition implementation should check this at every iteration.
Q: What information would you log from a ReAct trajectory for debugging?
Log the full trajectory: every thought, every tool call with its arguments, every observation, and the final answer. Also log: total iterations, total tokens consumed, which tools were called and how many times, whether the run succeeded or hit the iteration limit, wall-clock time for each step, and the original user question. This gives you everything you need to diagnose failures and measure efficiency.
Q: How does ReAct relate to LangGraph?
LangGraph is a framework for building stateful, graph-based agent workflows. A basic ReAct loop can be expressed as a LangGraph graph with nodes for "reason," "act," and "observe" and edges that cycle until a stopping condition is met. LangGraph adds: persistent state across iterations, branching logic (different actions depending on tool results), parallel execution of independent steps, and checkpointing for recovery from failures. ReAct is the mental model; LangGraph is one way to implement it at production scale.
Q: What is the Reflexion enhancement to ReAct?
Reflexion (Shinn et al., 2023) adds a self-reflection step after a failed ReAct trajectory. After a failed attempt, the agent generates a verbal reflection on what went wrong and why. This reflection is stored in a persistent memory and used to condition the next attempt. The key insight is that failed trajectories are learning signals - the agent should explicitly reason about its failures rather than just retrying with the same approach.
:::tip 🎮 Interactive Playground
Visualize this concept: Try the ReAct Agent demo on the EngineersOfAI Playground - no code required.
:::
