What is ReAct pattern?

Learn how to build LLM agents that reason and act by interleaving thought and tool calls - the architectural pattern behind every modern AI assistant.

How does reasoning acting work in practice?

ReAct Pattern covers ReAct pattern, reasoning acting, LLM agents from first principles with code examples. Free lesson at https://engineersofai.com/docs/llms/prompt-engineering/react-pattern

What is the difference between ReAct pattern and LLM agents?

See the full breakdown at https://engineersofai.com/docs/llms/prompt-engineering/react-pattern

ReAct Pattern

The Research Assistant That Couldn't Research

Your company is building an internal AI research assistant. Users ask questions like: "What is the current quarterly revenue for our top 3 competitors, and how does it compare to ours?"

You build it with chain-of-thought. The model reasons beautifully: "I should look up Competitor A's revenue, then Competitor B's, then Competitor C's, then compare..."

And then it makes up numbers. Confidently, fluently, completely wrong.

The problem is fundamental: the model's training data has a cutoff. It can't access the internet. It can't query your internal database. It can't run a calculation it hasn't memorized. It can reason about what to do but has no mechanism to actually do it.

You add a search tool, a database query function, and a calculator. You wire them up using the ReAct pattern. Now the model outputs:

Thought: I need to find Competitor A's most recent quarterly revenue.
Action: search("Competitor A Q4 2025 quarterly revenue")
Observation: According to their Q4 2025 earnings release, Competitor A reported $482M in revenue.

Thought: Now I need Competitor B's revenue.
Action: database_query("SELECT revenue FROM competitor_data WHERE company='B' AND quarter='Q4-2025'")
Observation: $291M

Thought: Now Competitor C.
Action: search("Competitor C Q4 2025 earnings")
Observation: Competitor C reported $167M for Q4 2025.

Thought: I have all three competitor revenues. Now I need our internal Q4 2025 revenue.
Action: database_query("SELECT revenue FROM financials WHERE quarter='Q4-2025'")
Observation: $318M

Thought: I have all the data. Let me compute the comparison.
Action: calculator("482 + 291 + 167")
Observation: 940

Thought: Our revenue is $318M against combined competitor revenue of $940M.
As a percentage: 318/940 = 33.8% of the total addressable market represented by these four companies.

Final Answer: [complete analysis]

Every number is real. The analysis is grounded. The model didn't hallucinate - it couldn't, because it looked everything up.

This is ReAct. It's not just a prompting technique - it's the architectural foundation of modern LLM agents.

Why This Exists: The Grounding Problem

Pure chain-of-thought reasoning is powerful but fundamentally limited: it can only reason about what the model already knows.

This creates several critical failure modes:

Temporal staleness: Knowledge cutoffs mean the model's facts are outdated
Hallucination under uncertainty: When the model doesn't know something, it often makes it up
No real-world effects: The model can reason about sending an email but can't actually send it
No private data access: The model can't access your internal databases, APIs, or file systems

Tool calling - giving models the ability to invoke functions and receive results - addresses all of these. But tool calling alone isn't enough. You need a reasoning framework that decides which tools to call, when to call them, and how to interpret their results.

That framework is ReAct.

Historical Context: Yao et al. 2022

"ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022, Princeton & Google Brain) introduced the pattern.

The key contribution was the empirical demonstration that interleaving reasoning and acting outperforms either alone:

Pure reasoning (CoT): model thinks but can't access external information
Pure acting (tool calls without reasoning): model calls tools but lacks coherent planning
ReAct (interleaved): reasoning guides action selection; observations update reasoning

The paper tested ReAct on:

HotpotQA (multi-hop question answering requiring web search): +15% over CoT
FEVER (fact verification): +6% over CoT
ALFWorld (interactive household tasks): 71% vs 45% success rate for CoT

The "aha moment" of the paper: when ReAct failed, it was often because the model's reasoning was incorrect - but the model knew something was wrong and asked for help or tried an alternative. CoT fails silently. ReAct fails transparently.

The ReAct Loop

ReAct structures model behavior as a repeating cycle:

$\text{Thought} \rightarrow \text{Action} \rightarrow \text{Observation} \rightarrow \text{Thought} \rightarrow \ldots$

Thought: The model reasons about what it knows and what it needs to do next. This is internal reasoning - it doesn't affect the external world.

Action: The model invokes a tool with specific parameters. This affects the external world.

Observation: The result from the tool call is injected into the context. The model reads it.

The loop continues until the model either has enough information to produce a final answer or determines it cannot solve the problem.

ReAct vs. Chain-of-Thought

Dimension	Chain-of-Thought	ReAct
External access	No	Yes - tools
Fact currency	Limited by training	Real-time via search
Failure mode	Silent hallucination	Transparent errors
Complexity	Low	Medium
Cost	Low	Higher (tool calls + more tokens)
When to use	Self-contained reasoning	Tasks requiring external data/actions

Use CoT when: the question can be answered from training knowledge, the task is computational/logical (not factual retrieval), latency and cost are primary concerns.

Use ReAct when: the answer depends on real-time data, private data, external APIs, or when you need the model to take actions with real-world effects.

Common Tools in ReAct Agents

Real production ReAct agents typically include:

Tool	What it does	When the model calls it
`search(query)`	Web search	Real-time facts, news
`database_query(sql)`	SQL against internal DB	Private data
`calculator(expr)`	Arithmetic evaluation	Avoid hallucinated math
`code_interpreter(code)`	Execute Python	Data analysis, complex computation
`file_read(path)`	Read a file	Document processing
`api_call(endpoint, params)`	External API	CRM, calendar, payment systems
`email_send(to, subject, body)`	Send email	Notification, outreach

The model decides which tools to call based on the reasoning in its Thought step.

Implementing ReAct from Scratch

import anthropic
import json
import re
from typing import Callable

client = anthropic.Anthropic()

# Define tools
def search_web(query: str) -> str:
    """Simulated web search. In production, call Tavily, SerpAPI, etc."""
    # Simulated responses for demonstration
    results = {
        "Python 3.12 release date": "Python 3.12 was released on October 2, 2023.",
        "what is the capital of France": "The capital of France is Paris.",
        "anthropic claude model": "Claude Sonnet 4.6 is Anthropic's latest mid-tier model as of 2025.",
    }
    # Simple keyword matching for simulation
    for key, value in results.items():
        if any(word in query.lower() for word in key.lower().split()):
            return value
    return f"No results found for: {query}"


def calculate(expression: str) -> str:
    """Safely evaluate a mathematical expression."""
    try:
        # Only allow safe mathematical operations
        allowed = set('0123456789+-*/()., ')
        if not all(c in allowed for c in expression):
            return "Error: only basic arithmetic allowed"
        result = eval(expression)  # safe because we validated
        return str(result)
    except Exception as e:
        return f"Error: {str(e)}"


def get_current_date() -> str:
    """Get the current date."""
    from datetime import datetime
    return datetime.now().strftime("%Y-%m-%d")


# Tool registry
TOOLS: dict[str, Callable] = {
    "search": search_web,
    "calculate": calculate,
    "get_date": get_current_date,
}

TOOL_DESCRIPTIONS = """
Available tools:
- search(query: str) -> str: Search the web for information
- calculate(expression: str) -> str: Evaluate a mathematical expression
- get_date() -> str: Get the current date

To use a tool, output a line in this exact format:
Action: tool_name(arguments)
"""

REACT_SYSTEM_PROMPT = f"""You are a helpful assistant that can use tools to answer questions accurately.

{TOOL_DESCRIPTIONS}

For each step in your reasoning:
1. Write "Thought: [your reasoning about what to do next]"
2. Write "Action: [tool_name]([arguments])" to call a tool
3. The tool result will appear as "Observation: [result]"
4. Repeat until you have enough information
5. Write "Final Answer: [your complete answer]"

Important:
- Always reason before acting
- Use tools when you need real-time or specific information
- Don't make up facts - use the search tool if uncertain
- Stop once you have a definitive answer
"""


def parse_action(text: str) -> tuple[str, str] | None:
    """Parse an Action line into (tool_name, arguments)."""
    match = re.search(r"Action:\s*(\w+)\(([^)]*)\)", text)
    if match:
        tool_name = match.group(1)
        args_str = match.group(2).strip().strip('"\'')
        return tool_name, args_str
    return None


def execute_tool(tool_name: str, args: str) -> str:
    """Execute a tool and return its result."""
    if tool_name not in TOOLS:
        return f"Error: unknown tool '{tool_name}'. Available: {list(TOOLS.keys())}"

    tool = TOOLS[tool_name]
    try:
        if args:
            result = tool(args)
        else:
            result = tool()
        return str(result)
    except Exception as e:
        return f"Tool error: {str(e)}"


def react_agent(question: str, max_steps: int = 10) -> str:
    """
    ReAct agent that answers questions using tools.

    Args:
        question: The user's question
        max_steps: Maximum number of Thought/Action/Observation cycles

    Returns:
        The agent's final answer
    """
    messages = [
        {"role": "user", "content": question}
    ]

    full_response_parts = []

    for step in range(max_steps):
        # Get next model output
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=500,
            temperature=0,
            system=REACT_SYSTEM_PROMPT,
            messages=messages
        )

        assistant_text = response.content[0].text
        full_response_parts.append(assistant_text)
        print(f"\n--- Step {step + 1} ---")
        print(assistant_text)

        # Check if agent has a final answer
        if "Final Answer:" in assistant_text:
            # Extract the final answer
            match = re.search(r"Final Answer:\s*(.+)", assistant_text, re.DOTALL)
            if match:
                return match.group(1).strip()
            return assistant_text

        # Parse and execute any action
        action = parse_action(assistant_text)
        if action:
            tool_name, tool_args = action
            observation = execute_tool(tool_name, tool_args)
            observation_text = f"Observation: {observation}"
            print(observation_text)

            # Add to conversation: assistant output + observation
            messages.append({
                "role": "assistant",
                "content": assistant_text
            })
            messages.append({
                "role": "user",
                "content": observation_text
            })
        else:
            # No action found - model may be done or confused
            if "Thought:" in assistant_text:
                # Model is still reasoning, continue
                messages.append({
                    "role": "assistant",
                    "content": assistant_text
                })
                messages.append({
                    "role": "user",
                    "content": "Continue your reasoning."
                })
            else:
                break

    return "Max steps reached. Partial response: " + " ".join(full_response_parts)


# Test the agent
questions = [
    "What day of the week was January 1st, 2025? (First, get today's date for context)",
    "If Python 3.12 was released and the next major version adds 18 months, when is Python 3.13 due?",
    "What is 15% of 847, rounded to the nearest dollar?",
]

for q in questions:
    print(f"\n{'='*60}")
    print(f"Question: {q}")
    print('='*60)
    answer = react_agent(q)
    print(f"\nFinal Answer: {answer}")

Using Claude's Native Tool Use (Recommended for Production)

Instead of parsing text, use Claude's structured tool calling:

import anthropic
import json

client = anthropic.Anthropic()

# Define tools in Claude's tool format
tools = [
    {
        "name": "search",
        "description": "Search the web for current information",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate (e.g., '15 * 847 / 100')"
                }
            },
            "required": ["expression"]
        }
    }
]


def process_tool_call(tool_name: str, tool_input: dict) -> str:
    """Execute the appropriate tool based on the model's request."""
    if tool_name == "search":
        return search_web(tool_input["query"])
    elif tool_name == "calculate":
        return calculate(tool_input["expression"])
    return f"Unknown tool: {tool_name}"


def react_agent_native_tools(question: str, max_steps: int = 10) -> str:
    """
    ReAct agent using Claude's native tool_use feature.
    This is more reliable than text parsing.
    """
    messages = [{"role": "user", "content": question}]

    for step in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # Add assistant's response to history
        messages.append({"role": "assistant", "content": response.content})

        # Check stop reason
        if response.stop_reason == "end_turn":
            # Extract text response
            for block in response.content:
                if hasattr(block, 'text'):
                    return block.text
            break

        if response.stop_reason == "tool_use":
            # Process all tool calls
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    tool_result = process_tool_call(block.name, block.input)
                    print(f"Tool: {block.name}({block.input})")
                    print(f"Result: {tool_result}")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": tool_result
                    })

            # Add tool results to conversation
            messages.append({"role": "user", "content": tool_results})

    return "Completed"


# Test
result = react_agent_native_tools(
    "What is 23% of the sum of 482 and 318? Show your calculation."
)
print(f"\nFinal: {result}")

Error Recovery in ReAct

Real-world tool calls fail. The ReAct loop naturally handles errors - the model reads the error in the Observation and adapts:

Thought: I'll search for the company's latest revenue figures.
Action: search("Acme Corp 2025 annual revenue")
Observation: Error: Search API rate limit exceeded

Thought: The search failed due to rate limiting. I'll try a different approach
and look for this in our internal database instead.
Action: database_query("SELECT revenue FROM companies WHERE name='Acme Corp' AND year=2025")
Observation: {"revenue": 142000000, "year": 2025}

Thought: Got the data from the database. Revenue is $142M.
Final Answer: Acme Corp's 2025 revenue was $142 million.

This graceful degradation is one of ReAct's practical strengths. Build your agents to expect and handle tool failures.

Production Considerations

1. Timeouts Are Non-Negotiable

import asyncio

async def react_with_timeout(question: str, timeout_seconds: float = 30.0) -> str:
    try:
        return await asyncio.wait_for(
            asyncio.to_thread(react_agent_native_tools, question),
            timeout=timeout_seconds
        )
    except asyncio.TimeoutError:
        return "Request timed out. Please try a simpler question or try again."

2. Limit Tool Call Depth

Agents can get into infinite loops. Always enforce a hard maximum:

MAX_TOOL_CALLS = 10  # Hard limit regardless of max_steps
tool_call_count = 0

def execute_tool_with_limit(tool_name: str, tool_input: dict) -> str:
    global tool_call_count
    tool_call_count += 1
    if tool_call_count > MAX_TOOL_CALLS:
        raise ValueError("Maximum tool call limit reached")
    return process_tool_call(tool_name, tool_input)

3. Log Every Step for Debugging

import logging

logger = logging.getLogger("react_agent")

def log_step(step: int, thought: str, action: str, observation: str):
    logger.info(
        "react_step",
        extra={
            "step": step,
            "thought": thought[:200],
            "action": action,
            "observation_len": len(observation),
            "observation_preview": observation[:100],
        }
    )

4. Sandbox Tool Execution

Especially for code execution tools - always run in an isolated environment:

# Use a sandboxed environment for code execution
# Never run model-generated code directly in your production environment

# Example: use a containerized execution service
def sandboxed_code_execute(code: str) -> str:
    response = requests.post(
        "http://code-sandbox-service/execute",
        json={"code": code, "timeout": 5, "memory_limit_mb": 128}
    )
    return response.json().get("output", "Execution failed")

Common Mistakes

:::danger Mistake 1: No Tool Call Limits Without a hard limit on tool calls, a misbehaving agent can call tools indefinitely, running up large costs. Always set and enforce a maximum. :::

:::danger Mistake 2: Trusting Tool Output Without Validation The model may misinterpret tool output and continue with wrong assumptions. Add structured output validation for critical tool calls. :::

:::warning Mistake 3: Using Text Parsing Instead of Native Tool Use Parsing "Action: tool_name(args)" from text is fragile. Use the model's native function calling/tool use API when available (Claude, GPT-4, Gemini all support it). It's more reliable and type-safe. :::

:::warning Mistake 4: Giving the Model Too Many Tools A model with 20 tools spends reasoning tokens deciding which tool to use. Start with 3-5 essential tools and add only when needed. Tool selection is a reasoning burden. :::

:::warning Mistake 5: No Fallback for Tool Failures Assume any tool can fail at any time. Design your system prompt to handle tool errors gracefully and either retry, use an alternative tool, or explain to the user why the task couldn't be completed. :::

Interview Q&A

Q1: What is the ReAct pattern and what problem does it solve?

ReAct (Reasoning + Acting) is a prompting framework that interleaves natural language reasoning (Thought) with tool calls (Action) and tool results (Observation). It solves the grounding problem: pure chain-of-thought reasoning can only use information the model already knows, leading to hallucination when real-time data or private data is needed. ReAct gives the model the ability to take actions in the world - search the web, query databases, call APIs, execute code - and incorporate the results into its reasoning. It's the foundational architecture behind LLM agents.

Q2: What is the Thought-Action-Observation loop?

The ReAct loop consists of three steps that repeat: (1) Thought - the model reasons about the current state, what it knows, and what it needs to do next; (2) Action - the model calls a specific tool with parameters; (3) Observation - the tool's result is injected into the context, which the model reads as input for the next Thought. The loop continues until the model has enough information to produce a final answer or determines the task is impossible. This interleaving is crucial: reasoning guides action selection, and observations update reasoning.

Q3: How does ReAct handle tool call failures?

Because observations (including error messages) are fed back into the context, the model can read an error message in the Observation step and adapt. A good ReAct implementation: the model sees "Error: search API unavailable," reasons about alternatives ("I'll try the internal database instead"), and takes a different action. This graceful degradation is one of ReAct's practical advantages over fixed workflows. In production, you should also have explicit error handling in your system prompt: "If a tool returns an error, try an alternative approach or inform the user."

Q4: What is the difference between text-parsing ReAct and native tool use?

Text-parsing ReAct extracts tool calls by parsing text patterns like "Action: search(query)" using regex. This is fragile - the model might format the action slightly differently, breaking your parser. Native tool use (Claude's tool_use API, OpenAI's function calling) uses structured APIs where the model outputs a JSON object specifying the tool name and arguments. The model is trained to use this format reliably. Native tool use is type-safe, more reliable, and provides clear separation between reasoning text and tool calls.

Q5: How would you design a production ReAct agent for a customer service application?

Key design decisions: (1) Tools: limit to essential ones - CRM lookup, order status, knowledge base search, ticket creation - probably 4-6 tools maximum; (2) System prompt: define the agent's persona, scope constraints (what it can and cannot do), escalation rules (when to transfer to human), and error handling; (3) Safety: sandbox all tool execution, validate all tool inputs, rate-limit tool calls, set hard limits on conversation length; (4) Monitoring: log every Thought/Action/Observation for debugging and compliance; (5) Fallback: always have a human handoff path when the agent is stuck or uncertain; (6) Evaluation: measure task completion rate, tool call accuracy, and time-to-resolution against a labeled evaluation set.

Q6: How does ReAct relate to modern AI agent frameworks like LangChain and LlamaIndex?

LangChain, LlamaIndex, and similar frameworks are essentially ReAct implementations with batteries included. They provide: pre-built tool integrations (search, databases, APIs), memory management (summarizing long conversations), structured tool schemas, error handling patterns, and monitoring hooks. The underlying architecture is ReAct - the frameworks just handle the boilerplate. Understanding ReAct from scratch lets you debug these frameworks, build custom implementations, and understand why they fail when they do. You can also build more efficient custom agents without framework overhead when performance is critical.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the ReAct Agent demo on the EngineersOfAI Playground - no code required.

:::

The Research Assistant That Couldn't Research​

Why This Exists: The Grounding Problem​

Historical Context: Yao et al. 2022​

The ReAct Loop​

ReAct vs. Chain-of-Thought​

Common Tools in ReAct Agents​

Implementing ReAct from Scratch​

Using Claude's Native Tool Use (Recommended for Production)​

Error Recovery in ReAct​

Production Considerations​

1. Timeouts Are Non-Negotiable​

2. Limit Tool Call Depth​

3. Log Every Step for Debugging​

4. Sandbox Tool Execution​

Common Mistakes​

Interview Q&A​