Skip to main content

ReAct Pattern

The Research Assistant That Couldn't Research

Your company is building an internal AI research assistant. Users ask questions like: "What is the current quarterly revenue for our top 3 competitors, and how does it compare to ours?"

You build it with chain-of-thought. The model reasons beautifully: "I should look up Competitor A's revenue, then Competitor B's, then Competitor C's, then compare..."

And then it makes up numbers. Confidently, fluently, completely wrong.

The problem is fundamental: the model's training data has a cutoff. It can't access the internet. It can't query your internal database. It can't run a calculation it hasn't memorized. It can reason about what to do but has no mechanism to actually do it.

You add a search tool, a database query function, and a calculator. You wire them up using the ReAct pattern. Now the model outputs:

Thought: I need to find Competitor A's most recent quarterly revenue.
Action: search("Competitor A Q4 2025 quarterly revenue")
Observation: According to their Q4 2025 earnings release, Competitor A reported $482M in revenue.

Thought: Now I need Competitor B's revenue.
Action: database_query("SELECT revenue FROM competitor_data WHERE company='B' AND quarter='Q4-2025'")
Observation: $291M

Thought: Now Competitor C.
Action: search("Competitor C Q4 2025 earnings")
Observation: Competitor C reported $167M for Q4 2025.

Thought: I have all three competitor revenues. Now I need our internal Q4 2025 revenue.
Action: database_query("SELECT revenue FROM financials WHERE quarter='Q4-2025'")
Observation: $318M

Thought: I have all the data. Let me compute the comparison.
Action: calculator("482 + 291 + 167")
Observation: 940

Thought: Our revenue is $318M against combined competitor revenue of $940M.
As a percentage: 318/940 = 33.8% of the total addressable market represented by these four companies.

Final Answer: [complete analysis]

Every number is real. The analysis is grounded. The model didn't hallucinate - it couldn't, because it looked everything up.

This is ReAct. It's not just a prompting technique - it's the architectural foundation of modern LLM agents.

Why This Exists: The Grounding Problem

Pure chain-of-thought reasoning is powerful but fundamentally limited: it can only reason about what the model already knows.

This creates several critical failure modes:

  1. Temporal staleness: Knowledge cutoffs mean the model's facts are outdated
  2. Hallucination under uncertainty: When the model doesn't know something, it often makes it up
  3. No real-world effects: The model can reason about sending an email but can't actually send it
  4. No private data access: The model can't access your internal databases, APIs, or file systems

Tool calling - giving models the ability to invoke functions and receive results - addresses all of these. But tool calling alone isn't enough. You need a reasoning framework that decides which tools to call, when to call them, and how to interpret their results.

That framework is ReAct.

Historical Context: Yao et al. 2022

"ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022, Princeton & Google Brain) introduced the pattern.

The key contribution was the empirical demonstration that interleaving reasoning and acting outperforms either alone:

  • Pure reasoning (CoT): model thinks but can't access external information
  • Pure acting (tool calls without reasoning): model calls tools but lacks coherent planning
  • ReAct (interleaved): reasoning guides action selection; observations update reasoning

The paper tested ReAct on:

  • HotpotQA (multi-hop question answering requiring web search): +15% over CoT
  • FEVER (fact verification): +6% over CoT
  • ALFWorld (interactive household tasks): 71% vs 45% success rate for CoT

The "aha moment" of the paper: when ReAct failed, it was often because the model's reasoning was incorrect - but the model knew something was wrong and asked for help or tried an alternative. CoT fails silently. ReAct fails transparently.

The ReAct Loop

ReAct structures model behavior as a repeating cycle:

ThoughtActionObservationThought\text{Thought} \rightarrow \text{Action} \rightarrow \text{Observation} \rightarrow \text{Thought} \rightarrow \ldots

Thought: The model reasons about what it knows and what it needs to do next. This is internal reasoning - it doesn't affect the external world.

Action: The model invokes a tool with specific parameters. This affects the external world.

Observation: The result from the tool call is injected into the context. The model reads it.

The loop continues until the model either has enough information to produce a final answer or determines it cannot solve the problem.

ReAct vs. Chain-of-Thought

DimensionChain-of-ThoughtReAct
External accessNoYes - tools
Fact currencyLimited by trainingReal-time via search
Failure modeSilent hallucinationTransparent errors
ComplexityLowMedium
CostLowHigher (tool calls + more tokens)
When to useSelf-contained reasoningTasks requiring external data/actions

Use CoT when: the question can be answered from training knowledge, the task is computational/logical (not factual retrieval), latency and cost are primary concerns.

Use ReAct when: the answer depends on real-time data, private data, external APIs, or when you need the model to take actions with real-world effects.

Common Tools in ReAct Agents

Real production ReAct agents typically include:

ToolWhat it doesWhen the model calls it
search(query)Web searchReal-time facts, news
database_query(sql)SQL against internal DBPrivate data
calculator(expr)Arithmetic evaluationAvoid hallucinated math
code_interpreter(code)Execute PythonData analysis, complex computation
file_read(path)Read a fileDocument processing
api_call(endpoint, params)External APICRM, calendar, payment systems
email_send(to, subject, body)Send emailNotification, outreach

The model decides which tools to call based on the reasoning in its Thought step.

Implementing ReAct from Scratch

import anthropic
import json
import re
from typing import Callable

client = anthropic.Anthropic()

# Define tools
def search_web(query: str) -> str:
"""Simulated web search. In production, call Tavily, SerpAPI, etc."""
# Simulated responses for demonstration
results = {
"Python 3.12 release date": "Python 3.12 was released on October 2, 2023.",
"what is the capital of France": "The capital of France is Paris.",
"anthropic claude model": "Claude Sonnet 4.6 is Anthropic's latest mid-tier model as of 2025.",
}
# Simple keyword matching for simulation
for key, value in results.items():
if any(word in query.lower() for word in key.lower().split()):
return value
return f"No results found for: {query}"


def calculate(expression: str) -> str:
"""Safely evaluate a mathematical expression."""
try:
# Only allow safe mathematical operations
allowed = set('0123456789+-*/()., ')
if not all(c in allowed for c in expression):
return "Error: only basic arithmetic allowed"
result = eval(expression) # safe because we validated
return str(result)
except Exception as e:
return f"Error: {str(e)}"


def get_current_date() -> str:
"""Get the current date."""
from datetime import datetime
return datetime.now().strftime("%Y-%m-%d")


# Tool registry
TOOLS: dict[str, Callable] = {
"search": search_web,
"calculate": calculate,
"get_date": get_current_date,
}

TOOL_DESCRIPTIONS = """
Available tools:
- search(query: str) -> str: Search the web for information
- calculate(expression: str) -> str: Evaluate a mathematical expression
- get_date() -> str: Get the current date

To use a tool, output a line in this exact format:
Action: tool_name(arguments)
"""

REACT_SYSTEM_PROMPT = f"""You are a helpful assistant that can use tools to answer questions accurately.

{TOOL_DESCRIPTIONS}

For each step in your reasoning:
1. Write "Thought: [your reasoning about what to do next]"
2. Write "Action: [tool_name]([arguments])" to call a tool
3. The tool result will appear as "Observation: [result]"
4. Repeat until you have enough information
5. Write "Final Answer: [your complete answer]"

Important:
- Always reason before acting
- Use tools when you need real-time or specific information
- Don't make up facts - use the search tool if uncertain
- Stop once you have a definitive answer
"""


def parse_action(text: str) -> tuple[str, str] | None:
"""Parse an Action line into (tool_name, arguments)."""
match = re.search(r"Action:\s*(\w+)\(([^)]*)\)", text)
if match:
tool_name = match.group(1)
args_str = match.group(2).strip().strip('"\'')
return tool_name, args_str
return None


def execute_tool(tool_name: str, args: str) -> str:
"""Execute a tool and return its result."""
if tool_name not in TOOLS:
return f"Error: unknown tool '{tool_name}'. Available: {list(TOOLS.keys())}"

tool = TOOLS[tool_name]
try:
if args:
result = tool(args)
else:
result = tool()
return str(result)
except Exception as e:
return f"Tool error: {str(e)}"


def react_agent(question: str, max_steps: int = 10) -> str:
"""
ReAct agent that answers questions using tools.

Args:
question: The user's question
max_steps: Maximum number of Thought/Action/Observation cycles

Returns:
The agent's final answer
"""
messages = [
{"role": "user", "content": question}
]

full_response_parts = []

for step in range(max_steps):
# Get next model output
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
temperature=0,
system=REACT_SYSTEM_PROMPT,
messages=messages
)

assistant_text = response.content[0].text
full_response_parts.append(assistant_text)
print(f"\n--- Step {step + 1} ---")
print(assistant_text)

# Check if agent has a final answer
if "Final Answer:" in assistant_text:
# Extract the final answer
match = re.search(r"Final Answer:\s*(.+)", assistant_text, re.DOTALL)
if match:
return match.group(1).strip()
return assistant_text

# Parse and execute any action
action = parse_action(assistant_text)
if action:
tool_name, tool_args = action
observation = execute_tool(tool_name, tool_args)
observation_text = f"Observation: {observation}"
print(observation_text)

# Add to conversation: assistant output + observation
messages.append({
"role": "assistant",
"content": assistant_text
})
messages.append({
"role": "user",
"content": observation_text
})
else:
# No action found - model may be done or confused
if "Thought:" in assistant_text:
# Model is still reasoning, continue
messages.append({
"role": "assistant",
"content": assistant_text
})
messages.append({
"role": "user",
"content": "Continue your reasoning."
})
else:
break

return "Max steps reached. Partial response: " + " ".join(full_response_parts)


# Test the agent
questions = [
"What day of the week was January 1st, 2025? (First, get today's date for context)",
"If Python 3.12 was released and the next major version adds 18 months, when is Python 3.13 due?",
"What is 15% of 847, rounded to the nearest dollar?",
]

for q in questions:
print(f"\n{'='*60}")
print(f"Question: {q}")
print('='*60)
answer = react_agent(q)
print(f"\nFinal Answer: {answer}")

Instead of parsing text, use Claude's structured tool calling:

import anthropic
import json

client = anthropic.Anthropic()

# Define tools in Claude's tool format
tools = [
{
"name": "search",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Mathematical expression to evaluate (e.g., '15 * 847 / 100')"
}
},
"required": ["expression"]
}
}
]


def process_tool_call(tool_name: str, tool_input: dict) -> str:
"""Execute the appropriate tool based on the model's request."""
if tool_name == "search":
return search_web(tool_input["query"])
elif tool_name == "calculate":
return calculate(tool_input["expression"])
return f"Unknown tool: {tool_name}"


def react_agent_native_tools(question: str, max_steps: int = 10) -> str:
"""
ReAct agent using Claude's native tool_use feature.
This is more reliable than text parsing.
"""
messages = [{"role": "user", "content": question}]

for step in range(max_steps):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages
)

# Add assistant's response to history
messages.append({"role": "assistant", "content": response.content})

# Check stop reason
if response.stop_reason == "end_turn":
# Extract text response
for block in response.content:
if hasattr(block, 'text'):
return block.text
break

if response.stop_reason == "tool_use":
# Process all tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_result = process_tool_call(block.name, block.input)
print(f"Tool: {block.name}({block.input})")
print(f"Result: {tool_result}")

tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": tool_result
})

# Add tool results to conversation
messages.append({"role": "user", "content": tool_results})

return "Completed"


# Test
result = react_agent_native_tools(
"What is 23% of the sum of 482 and 318? Show your calculation."
)
print(f"\nFinal: {result}")

Error Recovery in ReAct

Real-world tool calls fail. The ReAct loop naturally handles errors - the model reads the error in the Observation and adapts:

Thought: I'll search for the company's latest revenue figures.
Action: search("Acme Corp 2025 annual revenue")
Observation: Error: Search API rate limit exceeded

Thought: The search failed due to rate limiting. I'll try a different approach
and look for this in our internal database instead.
Action: database_query("SELECT revenue FROM companies WHERE name='Acme Corp' AND year=2025")
Observation: {"revenue": 142000000, "year": 2025}

Thought: Got the data from the database. Revenue is $142M.
Final Answer: Acme Corp's 2025 revenue was $142 million.

This graceful degradation is one of ReAct's practical strengths. Build your agents to expect and handle tool failures.

Production Considerations

1. Timeouts Are Non-Negotiable

import asyncio

async def react_with_timeout(question: str, timeout_seconds: float = 30.0) -> str:
try:
return await asyncio.wait_for(
asyncio.to_thread(react_agent_native_tools, question),
timeout=timeout_seconds
)
except asyncio.TimeoutError:
return "Request timed out. Please try a simpler question or try again."

2. Limit Tool Call Depth

Agents can get into infinite loops. Always enforce a hard maximum:

MAX_TOOL_CALLS = 10 # Hard limit regardless of max_steps
tool_call_count = 0

def execute_tool_with_limit(tool_name: str, tool_input: dict) -> str:
global tool_call_count
tool_call_count += 1
if tool_call_count > MAX_TOOL_CALLS:
raise ValueError("Maximum tool call limit reached")
return process_tool_call(tool_name, tool_input)

3. Log Every Step for Debugging

import logging

logger = logging.getLogger("react_agent")

def log_step(step: int, thought: str, action: str, observation: str):
logger.info(
"react_step",
extra={
"step": step,
"thought": thought[:200],
"action": action,
"observation_len": len(observation),
"observation_preview": observation[:100],
}
)

4. Sandbox Tool Execution

Especially for code execution tools - always run in an isolated environment:

# Use a sandboxed environment for code execution
# Never run model-generated code directly in your production environment

# Example: use a containerized execution service
def sandboxed_code_execute(code: str) -> str:
response = requests.post(
"http://code-sandbox-service/execute",
json={"code": code, "timeout": 5, "memory_limit_mb": 128}
)
return response.json().get("output", "Execution failed")

Common Mistakes

:::danger Mistake 1: No Tool Call Limits Without a hard limit on tool calls, a misbehaving agent can call tools indefinitely, running up large costs. Always set and enforce a maximum. :::

:::danger Mistake 2: Trusting Tool Output Without Validation The model may misinterpret tool output and continue with wrong assumptions. Add structured output validation for critical tool calls. :::

:::warning Mistake 3: Using Text Parsing Instead of Native Tool Use Parsing "Action: tool_name(args)" from text is fragile. Use the model's native function calling/tool use API when available (Claude, GPT-4, Gemini all support it). It's more reliable and type-safe. :::

:::warning Mistake 4: Giving the Model Too Many Tools A model with 20 tools spends reasoning tokens deciding which tool to use. Start with 3-5 essential tools and add only when needed. Tool selection is a reasoning burden. :::

:::warning Mistake 5: No Fallback for Tool Failures Assume any tool can fail at any time. Design your system prompt to handle tool errors gracefully and either retry, use an alternative tool, or explain to the user why the task couldn't be completed. :::

Interview Q&A

Q1: What is the ReAct pattern and what problem does it solve?

ReAct (Reasoning + Acting) is a prompting framework that interleaves natural language reasoning (Thought) with tool calls (Action) and tool results (Observation). It solves the grounding problem: pure chain-of-thought reasoning can only use information the model already knows, leading to hallucination when real-time data or private data is needed. ReAct gives the model the ability to take actions in the world - search the web, query databases, call APIs, execute code - and incorporate the results into its reasoning. It's the foundational architecture behind LLM agents.

Q2: What is the Thought-Action-Observation loop?

The ReAct loop consists of three steps that repeat: (1) Thought - the model reasons about the current state, what it knows, and what it needs to do next; (2) Action - the model calls a specific tool with parameters; (3) Observation - the tool's result is injected into the context, which the model reads as input for the next Thought. The loop continues until the model has enough information to produce a final answer or determines the task is impossible. This interleaving is crucial: reasoning guides action selection, and observations update reasoning.

Q3: How does ReAct handle tool call failures?

Because observations (including error messages) are fed back into the context, the model can read an error message in the Observation step and adapt. A good ReAct implementation: the model sees "Error: search API unavailable," reasons about alternatives ("I'll try the internal database instead"), and takes a different action. This graceful degradation is one of ReAct's practical advantages over fixed workflows. In production, you should also have explicit error handling in your system prompt: "If a tool returns an error, try an alternative approach or inform the user."

Q4: What is the difference between text-parsing ReAct and native tool use?

Text-parsing ReAct extracts tool calls by parsing text patterns like "Action: search(query)" using regex. This is fragile - the model might format the action slightly differently, breaking your parser. Native tool use (Claude's tool_use API, OpenAI's function calling) uses structured APIs where the model outputs a JSON object specifying the tool name and arguments. The model is trained to use this format reliably. Native tool use is type-safe, more reliable, and provides clear separation between reasoning text and tool calls.

Q5: How would you design a production ReAct agent for a customer service application?

Key design decisions: (1) Tools: limit to essential ones - CRM lookup, order status, knowledge base search, ticket creation - probably 4-6 tools maximum; (2) System prompt: define the agent's persona, scope constraints (what it can and cannot do), escalation rules (when to transfer to human), and error handling; (3) Safety: sandbox all tool execution, validate all tool inputs, rate-limit tool calls, set hard limits on conversation length; (4) Monitoring: log every Thought/Action/Observation for debugging and compliance; (5) Fallback: always have a human handoff path when the agent is stuck or uncertain; (6) Evaluation: measure task completion rate, tool call accuracy, and time-to-resolution against a labeled evaluation set.

Q6: How does ReAct relate to modern AI agent frameworks like LangChain and LlamaIndex?

LangChain, LlamaIndex, and similar frameworks are essentially ReAct implementations with batteries included. They provide: pre-built tool integrations (search, databases, APIs), memory management (summarizing long conversations), structured tool schemas, error handling patterns, and monitoring hooks. The underlying architecture is ReAct - the frameworks just handle the boilerplate. Understanding ReAct from scratch lets you debug these frameworks, build custom implementations, and understand why they fail when they do. You can also build more efficient custom agents without framework overhead when performance is critical.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the ReAct Agent demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.