Agent vs Chatbot vs Workflow
The Confusion Is Everywhere, and It Costs You
"We built an AI agent!" - but it is just a chatbot with a system prompt. "We have agentic AI!" - but it is a fixed pipeline with if/else branches. "Our agent handles email triage" - but it always routes emails the same way, with no dynamic decision making. The confusion is pervasive, and it matters because each architecture requires completely different engineering, has different cost profiles, and has different reliability characteristics.
The stakes: a team that builds a "full agent" when they need a workflow has spent 10x more engineering time than necessary, deployed something less reliable, and created operational complexity they cannot manage. A team that builds a chatbot when they need an agent has built something that cannot actually do the task - and will discover this in production.
Getting the architecture right requires precise definitions. Not marketing definitions. Not intuitive definitions. Technical criteria that you can apply to your specific use case and get a clear answer.
:::tip 🎮 Interactive Playground Visualize this concept: Try the Agent vs Chatbot vs Workflow demo on the EngineersOfAI Playground - no code required. :::
Why the Definitions Got Muddled
The confusion has a history. "Agent" became a marketing term in 2023, applied to anything that used an LLM. When AutoGPT went viral in March 2023, it popularized the idea of "autonomous AI agents" - but many of the things people actually built and called agents were glorified chatbots or rigid pipelines.
Part of the problem is that the technology exists on a spectrum. There is no binary switch between "chatbot" and "agent." There is a continuum of autonomy, tool use, and adaptiveness, and the boundaries between categories are fuzzy in practice.
The solution is not to find a perfect boundary - it is to identify the technical properties that drive architectural decisions, and to be precise about which properties your system actually has.
Precise Definitions
Chatbot
A chatbot is a system where:
- Input from the user goes to an LLM
- The LLM generates a text response
- The response is returned to the user
- No tools are called, no external state is modified
- Each turn may or may not include prior conversation history
The test: could you implement this system with a single API call per user message? If yes, it is a chatbot.
Examples: Customer service Q&A, knowledge base assistant, writing helper, code explanation.
What makes it NOT an agent: it cannot take action. It can only say things.
Workflow
A workflow is a system where:
- A sequence of steps is defined at design time
- LLMs appear as one or more steps in the sequence
- The sequence is deterministic (or probabilistic but bounded)
- The set of steps that will execute is known before the run starts
- External systems may be called, but in a predetermined pattern
The test: could you write a flowchart of all possible execution paths before running a single query? If yes, it is a workflow.
Examples: "Summarize email → classify urgency → route to department → generate draft reply." The sequence is fixed. The LLM fills in the content at each step.
What makes it NOT an agent: the structure of execution does not change based on what happens. The workflow does not decide to add a step or skip a step based on an unexpected observation.
Agent
An agent is a system where:
- The set of steps is determined at runtime, not design time
- Tool selection is dynamic - the agent decides which tool to call based on observations
- The number of iterations is variable and not known in advance
- The agent can deviate from any expected path based on what it encounters
- The agent is goal-directed: its decisions are motivated by achieving a goal
The test: can you describe in advance every tool call the system will make? If no - if the sequence depends on what the system observes during execution - it is an agent.
Examples: A coding agent that reads files, runs tests, discovers failures, backtracks, and makes different edits based on what it learns. The exact sequence of tool calls cannot be known before the run.
The Technical Decision Matrix
| Criterion | Chatbot | Workflow | Agent |
|---|---|---|---|
| Tools used | None | Fixed set, fixed order | Dynamic set, dynamic order |
| Number of LLM calls | 1 per user turn | Predetermined | Variable, determined at runtime |
| Execution path | Single pass | Predetermined DAG | Dynamically determined |
| State across steps | Optional (memory) | Yes, passed between steps | Yes, accumulated observations |
| Autonomy level | Zero | Low (follows script) | High |
| Handles unexpected | No | No (fails or falls back) | Yes (adapts) |
| Cost predictability | High | High | Low |
| Reliability | Very high | High | Variable |
| Engineering complexity | Low | Medium | High |
| Debugging ease | Easy | Medium | Difficult |
The Spectrum in Practice
Most production systems live somewhere in the middle of this spectrum. A customer service system might be a conditional workflow with one agentic step (dynamic routing). A data analysis pipeline might be a workflow with one agent step (exploratory analysis). A coding assistant might be a chatbot with one constrained tool (code execution).
The goal is not to build the most agentic system possible. The goal is to build the simplest system that satisfies the requirements.
Five Questions to Determine What You Need
Before choosing an architecture, answer these five questions about your task.
Question 1: Is the task sequence predictable? If you can write pseudocode for the entire process before running a single query - it is a workflow, not an agent. Workflows are faster, cheaper, and more reliable.
Question 2: Does the task require adapting to unexpected observations? If the system must handle situations that cannot be anticipated at design time - a CAPTCHA, an unexpected API response format, a file that does not exist - you need agentic behavior. Workflows fail silently or raise errors; agents adapt.
Question 3: How long is the task? Single turn: chatbot. Fixed multi-step: workflow. Variable multi-step determined at runtime: agent.
Question 4: What is the reliability requirement? If you need 99%+ reliability and predictability, use a workflow. Agents have compound error rates that grow with each step. A 10-step agent with 95% per-step reliability has a ~60% overall success rate.
Question 5: What is the cost budget? Chatbots: pennies per query. Workflows: cents per run. Agents: dollars per run (20+ tool calls × API cost per call). At scale, this difference is significant.
Code Examples: Email Triage in All Three Architectures
The same task - triage incoming emails - implemented as a chatbot, a workflow, and an agent.
Architecture 1: Chatbot
"""
Email triage as a chatbot.
Simple, fast, cheap. Returns a classification.
Limitation: cannot look up customer history or take action.
"""
import anthropic
client = anthropic.Anthropic()
def triage_email_chatbot(email_content: str) -> dict:
"""
Classify an email using a single LLM call.
Returns: priority, category, suggested_action
"""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=512,
system="""You are an email triage assistant. Classify the email and return JSON:
{
"priority": "high|medium|low",
"category": "billing|technical|general|complaint|praise",
"suggested_action": "brief description of what to do",
"reason": "brief explanation of classification"
}""",
messages=[{"role": "user", "content": f"Classify this email:\n\n{email_content}"}]
)
import json
text = response.content[0].text
# Extract JSON from response
try:
return json.loads(text)
except json.JSONDecodeError:
# Fallback if model returns prose
return {
"priority": "medium",
"category": "general",
"suggested_action": "Review manually",
"raw_response": text
}
# Usage
result = triage_email_chatbot("""
Subject: URGENT - Cannot access account, business affected
We have been locked out of our enterprise account for 3 hours.
Our entire team cannot access their work. This is severely impacting
our business operations. Customer ID: ENT-2847
""")
print(result)
# {'priority': 'high', 'category': 'technical', 'suggested_action': 'Escalate to support', ...}
What this cannot do: It cannot look up whether customer ENT-2847 is actually a paying customer, whether they have open tickets, what their SLA tier is. It classifies based on the email text alone.
Architecture 2: Workflow
"""
Email triage as a workflow.
Fixed sequence: extract → classify → lookup → route → respond.
Can access real data but follows a predetermined path.
"""
import anthropic
import json
from typing import Any
client = anthropic.Anthropic()
# Simulated database lookup functions
def get_customer_info(customer_id: str) -> dict | None:
"""Look up customer from database."""
CUSTOMERS = {
"ENT-2847": {"name": "Acme Corp", "tier": "enterprise", "sla_hours": 4, "arr": 95000},
"SMB-1203": {"name": "Startup Inc", "tier": "standard", "sla_hours": 24, "arr": 2400},
}
return CUSTOMERS.get(customer_id)
def get_open_tickets(customer_id: str) -> list[dict]:
"""Get open support tickets for a customer."""
TICKETS = {
"ENT-2847": [{"id": "T-89234", "created": "2024-01-15", "status": "open"}],
"SMB-1203": [],
}
return TICKETS.get(customer_id, [])
def route_to_team(priority: str, category: str, tier: str) -> str:
"""Determine which team should handle this."""
if priority == "high" and tier == "enterprise":
return "enterprise_support_immediate"
elif category == "billing":
return "billing_team"
elif category == "technical":
return "technical_support"
else:
return "general_support"
def triage_email_workflow(email_content: str) -> dict:
"""
Triage email using a fixed workflow.
Steps:
1. Extract metadata (LLM)
2. Look up customer (DB)
3. Classify priority (LLM, with context)
4. Route to team (rule-based)
5. Generate response (LLM)
"""
results = {}
# ── Step 1: Extract metadata ──────────────────────────────────────────────
extract_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
system="Extract structured data from an email. Return JSON only.",
messages=[{
"role": "user",
"content": (
f"Extract from this email:\n{email_content}\n\n"
"Return JSON: {\"customer_id\": str|null, \"subject\": str, "
"\"urgency_words\": [str], \"category_hint\": str}"
)
}]
)
metadata = json.loads(extract_response.content[0].text)
results["metadata"] = metadata
# ── Step 2: Look up customer ──────────────────────────────────────────────
customer_info = None
open_tickets = []
if metadata.get("customer_id"):
customer_info = get_customer_info(metadata["customer_id"])
if customer_info:
open_tickets = get_open_tickets(metadata["customer_id"])
results["customer"] = customer_info
# ── Step 3: Classify with context ────────────────────────────────────────
context = f"Email: {email_content[:500]}\n"
if customer_info:
context += f"Customer: {customer_info['name']}, tier: {customer_info['tier']}, SLA: {customer_info['sla_hours']}h\n"
context += f"Open tickets: {len(open_tickets)}\n"
classify_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
system="Classify email priority and category. Return JSON only.",
messages=[{
"role": "user",
"content": (
f"{context}\n"
"Return: {\"priority\": \"high|medium|low\", \"category\": \"billing|technical|general|complaint\"}"
)
}]
)
classification = json.loads(classify_response.content[0].text)
results["classification"] = classification
# ── Step 4: Route (deterministic logic) ──────────────────────────────────
tier = customer_info["tier"] if customer_info else "unknown"
team = route_to_team(
classification["priority"],
classification["category"],
tier
)
results["assigned_team"] = team
# ── Step 5: Generate response ─────────────────────────────────────────────
response_prompt = (
f"Write a brief acknowledgment email. "
f"Priority: {classification['priority']}. "
f"Customer tier: {tier}. "
f"Keep it professional and brief (2-3 sentences)."
)
resp_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
messages=[{"role": "user", "content": response_prompt}]
)
results["draft_response"] = resp_response.content[0].text
return results
What this can and cannot do: This workflow can look up real customer data and make routing decisions based on it. It always follows the same 5 steps. It cannot handle unexpected situations (what if the customer ID format is wrong? what if the SLA lookup fails?). It also cannot take action - it routes but does not actually update any system.
Architecture 3: Agent
"""
Email triage as an agent.
Dynamically decides what to look up, can take actions, handles unexpected situations.
More powerful than the workflow, more expensive, and less predictable.
"""
import anthropic
import json
from typing import Any
client = anthropic.Anthropic()
# Tool definitions for the email agent
EMAIL_AGENT_TOOLS = [
{
"name": "lookup_customer",
"description": (
"Look up customer information by ID or email. "
"Returns customer tier, SLA, account status, and contact info. "
"Use this whenever a customer identifier is found in the email."
),
"input_schema": {
"type": "object",
"properties": {
"identifier": {
"type": "string",
"description": "Customer ID (e.g., ENT-2847) or email address."
}
},
"required": ["identifier"]
}
},
{
"name": "get_ticket_history",
"description": "Retrieve the support ticket history for a customer.",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {"type": "string", "description": "Customer ID."},
"limit": {"type": "integer", "description": "Number of tickets to return.", "default": 5}
},
"required": ["customer_id"]
}
},
{
"name": "create_ticket",
"description": (
"Create a new support ticket. Use after triaging an email that requires follow-up. "
"Returns the new ticket ID."
),
"input_schema": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"subject": {"type": "string"},
"priority": {
"type": "string",
"enum": ["critical", "high", "medium", "low"]
},
"category": {"type": "string"},
"summary": {"type": "string", "description": "Brief summary of the issue."}
},
"required": ["customer_id", "subject", "priority", "category", "summary"]
}
},
{
"name": "escalate_to_human",
"description": (
"Escalate an email to a human support agent with context. "
"Use for enterprise customers with critical issues, "
"ambiguous situations requiring human judgment, or "
"situations where the standard routing is unclear."
),
"input_schema": {
"type": "object",
"properties": {
"team": {"type": "string", "description": "Team to escalate to."},
"urgency": {"type": "string", "enum": ["immediate", "same_day", "next_business_day"]},
"context": {"type": "string", "description": "Context for the human agent."}
},
"required": ["team", "urgency", "context"]
}
},
{
"name": "send_acknowledgment",
"description": "Send an automated acknowledgment email to the customer.",
"input_schema": {
"type": "object",
"properties": {
"customer_email": {"type": "string"},
"message": {"type": "string", "description": "Acknowledgment message to send."},
"ticket_id": {"type": "string", "description": "Associated ticket ID, if created."}
},
"required": ["customer_email", "message"]
}
}
]
def execute_email_tool(name: str, args: dict) -> str:
"""Execute email agent tools (simulated)."""
# Simulated database
customers = {
"ENT-2847": {
"name": "Acme Corp", "tier": "enterprise", "sla_hours": 4,
},
"SMB-1203": {
"name": "Startup Inc", "tier": "standard", "sla_hours": 24,
}
}
if name == "lookup_customer":
identifier = args.get("identifier", "")
customer = customers.get(identifier)
if customer:
return json.dumps(customer)
# Try to find by email
for cid, cdata in customers.items():
if cdata["email"] == identifier:
return json.dumps({"customer_id": cid, **cdata})
return f"Customer not found: {identifier}. They may be a new customer or using an unknown ID."
elif name == "get_ticket_history":
customer_id = args.get("customer_id", "")
tickets_db = {
"ENT-2847": [
{"id": "T-89234", "created": "2024-01-15", "status": "open",
"subject": "Login issues", "resolved": None},
{"id": "T-88901", "created": "2024-01-08", "status": "closed",
"subject": "Billing discrepancy", "resolved": "2024-01-09"},
]
}
tickets = tickets_db.get(customer_id, [])
if not tickets:
return f"No ticket history found for {customer_id}."
return json.dumps(tickets)
elif name == "create_ticket":
# Simulate ticket creation
ticket_id = f"T-{hash(json.dumps(args)) % 100000:05d}"
return f"Ticket created: {ticket_id}. Assigned to support queue."
elif name == "escalate_to_human":
team = args.get("team", "unknown")
urgency = args.get("urgency", "same_day")
return f"Escalated to {team} with {urgency} urgency. A human agent has been notified."
elif name == "send_acknowledgment":
email = args.get("customer_email", "unknown")
return f"Acknowledgment sent to {email}."
else:
return f"Unknown tool: {name}"
def triage_email_agent(email_content: str, customer_email: str = "") -> dict:
"""
Triage email using an agent.
The agent dynamically decides what to look up, what to do, and how to handle edge cases.
"""
messages = [{
"role": "user",
"content": (
f"Triage this incoming customer email and take appropriate action.\n\n"
f"Customer email address (if known): {customer_email or 'unknown'}\n\n"
f"Email content:\n{email_content}\n\n"
f"Instructions:\n"
f"1. Look up the customer if an ID or email is available\n"
f"2. Check ticket history for context\n"
f"3. Classify priority and category\n"
f"4. Create a ticket if needed\n"
f"5. Escalate to human if it is urgent/complex/enterprise-critical\n"
f"6. Send an acknowledgment to the customer\n"
f"7. Summarize what you did and why"
)
}]
system = """You are an expert customer support triage agent. You have tools to look up \
customer information, review ticket history, create tickets, escalate issues, and send acknowledgments.
Use your judgment to determine what needs to be done for each email. Enterprise customers with \
critical issues should be escalated immediately. Standard customers with common issues should get \
a ticket and automated acknowledgment. When in doubt, escalate."""
actions_taken = []
max_iterations = 15
for i in range(max_iterations):
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
system=system,
tools=EMAIL_AGENT_TOOLS,
messages=messages
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
final_summary = next(
(b.text for b in response.content if hasattr(b, "text")),
"Triage complete."
)
return {
"status": "complete",
"actions_taken": actions_taken,
"summary": final_summary,
"iterations": i + 1
}
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_email_tool(block.name, block.input)
actions_taken.append({
"tool": block.name,
"args": block.input,
"result": result[:100]
})
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
return {"status": "max_iterations_reached", "actions_taken": actions_taken}
# ── Compare all three ─────────────────────────────────────────────────────────
if __name__ == "__main__":
test_email = """
Subject: URGENT - Cannot access account, business affected
Hi Support,
We have been locked out of our enterprise account for 3 hours.
Our entire team of 45 people cannot access their work. This is severely
impacting our business operations. We have a deadline today.
Customer ID: ENT-2847
Contact: Sarah Chen, [email protected]
Please help ASAP.
"""
print("=== CHATBOT ===")
result1 = triage_email_chatbot(test_email)
print(f"Priority: {result1.get('priority')}, Category: {result1.get('category')}")
print(f"Actions: None - chatbot cannot take action")
print("\n=== WORKFLOW ===")
result2 = triage_email_workflow(test_email)
print(f"Priority: {result2['classification']['priority']}")
print(f"Assigned to: {result2['assigned_team']}")
print(f"Steps: Always 5, regardless of email content")
print("\n=== AGENT ===")
print(f"Actions taken: {len(result3['actions_taken'])}")
for action in result3["actions_taken"]:
print(f" - {action['tool']}({list(action['args'].keys())})")
print(f"Summary: {result3['summary'][:200]}")
Cost and Reliability Comparison
The agent is not better than the chatbot and workflow. It is more capable for open-ended tasks and more expensive and less reliable for predictable ones.
Decision Framework
When Agents Beat Workflows
Despite their cost and reliability disadvantages, agents win decisively in certain scenarios.
Open-ended exploration: "Analyze this codebase and find all the security vulnerabilities." A workflow cannot enumerate the steps in advance. An agent explores dynamically.
Unpredictable environments: Tasks where the environment changes during execution (prices change, APIs return unexpected formats, files do not exist). Workflows fail; agents adapt.
Complex reasoning with tool use: Tasks that require multi-step reasoning where each step's direction depends on the previous step's results. An agent naturally handles this; a workflow requires knowing all branches in advance.
Research and synthesis: "Find out everything relevant about competitor X and produce a comparison." The number and type of searches cannot be predetermined.
Production Engineering Notes
:::tip Start with the simplest architecture and add complexity only when needed Build the chatbot first. Add RAG if it needs external knowledge. Add deterministic tool calls if it needs to take action. Add a workflow if it needs multiple steps. Add an agent only when the task genuinely requires dynamic tool selection and variable execution paths. Every step up the complexity ladder costs more and breaks more. :::
:::warning Workflows can be mis-identified as agents If you can enumerate all the conditional branches in a system at design time, it is a workflow - even if those branches are numerous and the LLM makes decisions within each branch. The test is: could you draw the complete execution flowchart before running a query? If yes, it is a workflow. :::
:::danger Never build a full agent for a task that can be done with a script The compound error rate of agents makes them unreliable for tasks with clear success criteria and predictable structure. A Python script that calls an API, processes the result, and writes to a database is 100x more reliable than an agent trying to do the same thing. Agents are for tasks that genuinely cannot be scripted. :::
Interview Questions
Q: What are the precise technical criteria that distinguish an agent from a workflow?
Two criteria define the boundary. First, execution path determinism: in a workflow, the set of steps that will execute is known at design time - you can draw a complete flowchart before running any query. In an agent, the sequence of tool calls is determined at runtime based on observations - you cannot know in advance which tools will be called or how many times. Second, dynamic tool selection: a workflow calls tools in a predetermined order (possibly conditional, but the conditions are scripted). An agent selects tools dynamically based on LLM reasoning about the current state. If you can write pseudocode for the entire process before seeing any input, it is a workflow. If the process depends on what the system discovers during execution, it is an agent.
Q: What is the compound error rate problem and how does it affect architecture decisions?
Each step in an agent loop has some probability of failure. If each step is 95% reliable, then across N steps, the end-to-end success rate is 0.95^N. For 10 steps: 60%. For 20 steps: 36%. For 50 steps: 8%. This means agents are fundamentally less reliable than workflows for the same task, unless the per-step reliability is very high (99%+). This drives architecture decisions: for tasks that require 99%+ reliability, use a workflow. For tasks where 80-90% success is acceptable (exploration, research, first-draft generation), agents are viable. The compound error rate also argues for keeping agent trajectories short - each additional step reduces reliability.
Q: A product manager asks you to build an "AI agent" for customer FAQ. What do you build?
A customer FAQ system should almost certainly be a chatbot with RAG - not an agent. The task is well-defined: user asks a question, the system retrieves relevant FAQ content, the LLM generates a response. This does not require dynamic tool selection, variable iteration counts, or adaptation to unexpected observations. An agent would cost 10-50x more per query, be less reliable, and take longer to respond. The right answer: build a retrieval-augmented chatbot. If after deployment you discover edge cases where the chatbot fails (multi-step questions, questions requiring account lookups, questions requiring policy judgment), add those specific capabilities - but only those capabilities, as workflow steps or constrained tools. Do not build an agent for a task that is fundamentally a lookup problem.
Q: When would you choose a constrained agent over a full agent?
A constrained agent has a bounded tool set and/or scope: it can only use specific tools, can only access specific data sources, and/or has explicit limits on what actions it can take. Choose a constrained agent when: the task requires dynamic reasoning (ruling out pure workflows) but operates in a well-defined domain (ruling out the need for unlimited tools). Example: a code review agent that can read files, run linters, and comment on pull requests - but cannot delete files, push to main, or access production systems. The constraint makes the agent more reliable (fewer ways to go wrong) and more auditable (limited blast radius if something fails). Full agents are appropriate only when the task genuinely requires an unbounded tool set.
Q: How do you explain to a non-technical stakeholder why a "simple-sounding" agentic task is actually expensive?
Frame it around the number of operations. A chatbot answering a FAQ question: 1 API call, costs pennies. An agent completing a task like "analyze our competitor's pricing and update our pricing model": may require 20-30 tool calls (search 10 competitors, read pricing pages, extract data, analyze, generate comparison, update spreadsheet). Each tool call triggers an LLM API call. At 25 calls, even at efficient pricing, that is 1,000-5,000/month, just for the LLM calls, before infrastructure. Then explain compound errors: at 95% per step, a 25-step agent succeeds ~28% of the time. You are paying for 10,000 runs to get roughly 2,800 successful completions. The cost-per-successful-run is much higher than the cost-per-run.
