CrewAI
The Human Team Metaphor
Every successful human team has structure. A researcher who finds information. An analyst who interprets it. A writer who communicates it. A reviewer who catches errors. Each person has a role, a goal, a set of skills, and background that shapes how they approach problems.
CrewAI maps this directly to AI agents. Not as a metaphor - as the literal API.
analyst = Agent(
role="Senior Market Analyst",
goal="Identify emerging competitive threats in the enterprise software market",
backstory="You are a 15-year veteran market analyst who has tracked enterprise software..."
)
The role shapes what the LLM pretends to be. The goal directs its focus. The backstory provides the character context that subtly shapes every response. Together they produce an agent that consistently behaves like that specialist - more reliably than a generic agent given instructions at task time.
CrewAI (Moura, 2024) became the most widely adopted multi-agent framework for production use cases by 2025. Not because it is the most powerful (LangGraph is more powerful), but because it is the most intuitive, the best-documented, and the fastest to get from idea to working system.
This lesson covers CrewAI v0.80+ in depth: the Crew/Agent/Task architecture, sequential and hierarchical processes, Flows for event-driven workflows, memory integration, tool use, and a complete production-grade implementation.
:::tip 🎮 Interactive Playground Visualize this concept: Try the Agent Frameworks demo on the EngineersOfAI Playground - no code required. :::
Why CrewAI Exists
Before CrewAI, building multi-agent systems required either writing custom orchestration code (tedious, error-prone, not reusable) or using AutoGen (powerful but conversation-centric - not ideal for task pipelines).
CrewAI's answer: an opinionated, high-level API designed specifically for structured task delegation to role-specialized agents.
The key insight from founder João Moura: production multi-agent systems almost always follow the same pattern - a set of specialists with defined roles, a set of tasks with defined outputs, and a process for coordinating them. If you standardize this pattern into an API, 90% of use cases become configuration, not code.
CrewAI timeline:
- January 2024: Initial release. Crew, Agent, Task, sequential/hierarchical process.
- Mid 2024: Memory system (short-term, long-term, entity), delegation, custom tools.
- Late 2024 (v0.80+): Flows - event-driven state-managed workflows with Python decorators.
- 2025: CrewAI+ enterprise platform, enhanced observability, training API.
Core Concepts
Agent
An agent is the fundamental unit. It has an identity (role, goal, backstory) that shapes its behavior, tools it can use, and a model that powers it.
from crewai import Agent
from crewai_tools import SerperDevTool, WebsiteSearchTool
web_search = SerperDevTool()
web_reader = WebsiteSearchTool()
research_agent = Agent(
role="Senior Research Analyst",
goal=(
"Find accurate, current information about {topic} and produce "
"structured research notes with specific data points and sources."
),
backstory=(
"You are a meticulous research analyst with 12 years of experience "
"in technology and business research. You have a reputation for finding "
"the specific data others miss, and for distinguishing between speculation "
"and confirmed facts. You always cite sources and flag uncertainty."
),
tools=[web_search, web_reader],
verbose=True,
max_iter=5, # Max tool call iterations per task
memory=True, # Enable short-term conversation memory
allow_delegation=False, # This agent focuses on its own task
llm="claude-opus-4-6",
)
Agent design principles:
| Element | Bad (too vague) | Good (specific) |
|---|---|---|
| Role | "Research helper" | "Senior Competitive Intelligence Analyst" |
| Goal | "Help with research" | "Identify three specific competitive threats with supporting data" |
| Backstory | "You are helpful" | "12-year veteran who distinguishes speculation from confirmed facts" |
The backstory is not just flavor text. LLMs respond to persona framing. An agent told it is a "12-year veteran who distinguishes speculation from confirmed facts" will actually be more careful about epistemic status than one told "you are a research assistant."
Task
A task defines a specific unit of work. It has a description, expected output, and an assigned agent.
from crewai import Task
research_task = Task(
description=(
"Research the competitive landscape for {company} in the {market} market.\n\n"
"Specifically find:\n"
"1. Top 3-5 direct competitors with their key differentiators\n"
"2. Recent market trends (last 6 months)\n"
"3. Pricing models and tiers used by competitors\n"
"4. Any recent funding rounds, acquisitions, or major product launches\n\n"
"Focus on specific facts with sources. Flag any information you cannot verify."
),
expected_output=(
"A structured research report with:\n"
"- Executive summary (3-5 sentences)\n"
"- Competitor profiles (one per competitor, with key facts)\n"
"- Market trends section\n"
"- Pricing comparison table\n"
"- Sources list"
),
agent=research_agent,
output_file="research_report.md", # Save output to file
)
Task chaining: Tasks can receive output from previous tasks via context:
analysis_task = Task(
description="Using the research report, identify the top 3 strategic threats...",
expected_output="Strategic threat analysis with prioritized recommendations",
agent=analyst_agent,
context=[research_task], # This task receives research_task's output
)
Process: Sequential vs Hierarchical
Sequential: Tasks execute in order. Each task's output is passed as context to the next.
research_task → analysis_task → writing_task → review_task
Hierarchical: A manager agent directs other agents, assigning tasks based on its judgment of who should do what. The manager can re-assign, split tasks, or handle coordination.
manager_agent
/ | \
researcher analyst writer
The Crew Architecture
CrewAI Memory System
CrewAI v0.80+ has a three-tier memory system:
Short-term memory: Conversation history within the current crew execution. Agents can refer to what other agents said earlier in the same run.
Long-term memory: ChromaDB-backed vector store. Persists across runs. Stores key facts, preferences, and outcomes from previous executions.
Entity memory: Knowledge graph of entities extracted from conversations - who, what, and how things relate.
from crewai import Crew, Process, Agent, Task
from crewai.memory import LongTermMemory, ShortTermMemory, EntityMemory
from crewai.memory.storage.rag_storage import RAGStorage
import chromadb
# Configure long-term memory with ChromaDB
lt_memory = LongTermMemory(
storage=RAGStorage(
type="chroma",
collection_name="crewai_long_term",
path="./crew_memory",
)
)
crew_with_memory = Crew(
agents=[research_agent, analyst_agent, writer_agent],
tasks=[research_task, analysis_task, writing_task],
process=Process.sequential,
memory=True, # Enable all memory types
long_term_memory=lt_memory,
verbose=True,
)
CrewAI Flows (v0.80+)
Flows are event-driven, stateful workflows. They use Python decorators to define what happens when certain events occur, making complex conditional logic readable and maintainable.
"""
CrewAI Flow: Content Quality Pipeline
Demonstrates event-driven workflow with state management.
"""
from crewai.flow.flow import Flow, listen, start, router, or_
from pydantic import BaseModel
from typing import Optional
class ContentPipelineState(BaseModel):
"""Typed state shared across all flow methods."""
topic: str = ""
research_notes: str = ""
draft_content: str = ""
quality_score: int = 0
revision_count: int = 0
final_content: str = ""
approved: bool = False
max_revisions: int = 3
class ContentPipelineFlow(Flow[ContentPipelineState]):
"""
Event-driven content pipeline with quality gates and revision loops.
"""
@start()
def initialize(self):
"""Entry point - always called first."""
print(f"[Flow] Starting content pipeline for: {self.state.topic}")
return self.state.topic
@listen(initialize)
def run_research(self, topic: str):
"""Triggered when initialize completes."""
print(f"[Flow] Running research crew for topic: {topic}")
# In production: invoke an actual Crew here
# For demo, simulate research output
self.state.research_notes = f"""
Research Notes: {topic}
- Key finding 1: Growing adoption of {topic} in enterprise contexts
- Key finding 2: Three major vendors dominating the space
- Key finding 3: Cost reductions of 30-40% reported by early adopters
- Data source: Industry reports Q4 2024
"""
return self.state.research_notes
@listen(run_research)
def write_draft(self, research: str):
"""Triggered when research completes."""
print("[Flow] Writing draft based on research")
# In production: invoke writer Crew/Agent here
self.state.draft_content = f"""
# {self.state.topic}: A Strategic Overview
Based on recent research, {self.state.topic} is transforming how enterprises approach
their technical infrastructure. {research.strip()[:200]}
Organizations adopting this approach report significant efficiency gains, with early
adopters citing 30-40% cost reductions in relevant operations.
"""
self.state.revision_count += 1
return self.state.draft_content
@router(write_draft)
def quality_check(self, draft: str):
"""
Router: evaluate quality and decide next step.
Returns the name of the method to call next.
"""
print("[Flow] Running quality check")
# In production: use LLM or Crew to score quality
# Simple heuristic for demo
word_count = len(draft.split())
quality_score = min(100, word_count * 2) # Simple proxy
self.state.quality_score = quality_score
print(f"[Flow] Quality score: {quality_score}/100 (revision #{self.state.revision_count})")
if quality_score >= 70:
return "approve"
elif self.state.revision_count >= self.state.max_revisions:
return "force_approve" # Max revisions reached - take what we have
else:
return "revise"
@listen("approve")
def approve_content(self):
"""Called when quality check approves the draft."""
print("[Flow] Content approved!")
self.state.final_content = self.state.draft_content
self.state.approved = True
return self.state.final_content
@listen("force_approve")
def force_approve_content(self):
"""Called when max revisions reached."""
print(f"[Flow] Max revisions ({self.state.max_revisions}) reached. Approving best draft.")
self.state.final_content = self.state.draft_content
self.state.approved = True
return self.state.final_content
@listen("revise")
def request_revision(self):
"""Called when quality check requests revisions."""
print(f"[Flow] Requesting revision (attempt {self.state.revision_count})")
# Simulate improvement in revision
self.state.draft_content += f"\n\n## Additional Context (Revision {self.state.revision_count})\n"
self.state.draft_content += "Expanded analysis based on quality review feedback...\n"
# Loop back to quality check
return self.state.draft_content
# Wire the revision loop: revise triggers quality_check again
@listen(request_revision)
def re_check_quality(self, revised_draft: str):
"""Re-evaluate quality after revision."""
return self.quality_check(revised_draft)
def run_content_pipeline(topic: str) -> ContentPipelineState:
"""Run the content pipeline flow."""
flow = ContentPipelineFlow()
flow.state.topic = topic
flow.kickoff()
print(f"\n[Flow Complete]")
print(f" Topic: {flow.state.topic}")
print(f" Revisions: {flow.state.revision_count}")
print(f" Quality score: {flow.state.quality_score}")
print(f" Approved: {flow.state.approved}")
return flow.state
Full Production Example: Competitive Intelligence Crew
"""
Production CrewAI: Competitive Intelligence System
Four-agent crew:
1. Researcher: web search + data gathering
2. Analyst: interprets data, identifies patterns
3. Writer: produces polished intelligence report
4. Reviewer: quality gates and final approval
Install: pip install crewai crewai-tools anthropic
"""
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
from crewai.tasks.task_output import TaskOutput
import anthropic
import os
from typing import Optional
def build_competitive_intelligence_crew(
company: str,
market: str,
output_file: Optional[str] = None,
) -> str:
"""
Build and run a competitive intelligence crew.
Returns the final intelligence report as a string.
"""
# ── Tools ────────────────────────────────────────────────────
web_search = SerperDevTool() # Web search via Serper API
web_reader = WebsiteSearchTool() # Read specific websites
# ── Agents ───────────────────────────────────────────────────
researcher = Agent(
role="Senior Competitive Intelligence Researcher",
goal=(
f"Gather comprehensive, factual intelligence about {company}'s competitors "
f"in the {market} market. Focus on verifiable facts, not speculation."
),
backstory=(
f"You are a 15-year veteran competitive intelligence researcher who has "
f"tracked the {market} market since its early days. You have an exceptional "
f"ability to find specific data points (funding amounts, customer counts, "
f"pricing tiers) that others miss. You always note the source and date of "
f"information and explicitly flag anything you cannot verify."
),
tools=[web_search, web_reader],
verbose=True,
max_iter=6,
allow_delegation=False,
llm="claude-opus-4-6",
)
analyst = Agent(
role="Strategic Market Analyst",
goal=(
f"Analyze competitive intelligence data to identify strategic opportunities "
f"and threats for {company} in the {market} market."
),
backstory=(
"You are a former McKinsey consultant now running your own advisory practice. "
"You specialize in competitive strategy and have advised dozens of technology "
"companies on market positioning. You think in frameworks (Porter's Five Forces, "
"BCG matrix, SWOT) but always ground analysis in specific data. You distinguish "
"clearly between facts (backed by data) and inferences (your interpretation)."
),
tools=[], # Analyst works with information provided, not web searches
verbose=True,
max_iter=3,
allow_delegation=False,
llm="claude-opus-4-6",
)
writer = Agent(
role="Intelligence Report Writer",
goal=(
"Transform complex competitive analysis into a clear, actionable intelligence "
"report that executives can read and act on in 15 minutes."
),
backstory=(
"You spent 8 years as an intelligence analyst writing classified briefs for "
"government clients who had zero time and needed everything crisp and actionable. "
"You learned that every sentence must earn its place. You never use filler phrases "
"or hedge unnecessarily. You structure reports for skim-ability: executive summary "
"first, details second, and every section has a clear takeaway."
),
tools=[],
verbose=True,
max_iter=2,
allow_delegation=False,
llm="claude-opus-4-6",
)
reviewer = Agent(
role="Intelligence Quality Reviewer",
goal=(
"Ensure the intelligence report is accurate, actionable, and free of unsupported "
"claims before it reaches executive leadership."
),
backstory=(
"You are the gatekeeper. Before any intelligence product leaves your team, "
"it passes through you. You have zero tolerance for vague claims, unsupported "
"assertions, or recommendations without clear rationale. You have saved your "
"organization from bad decisions based on poor intelligence multiple times. "
"You approve work only when it meets the highest standards."
),
tools=[],
verbose=True,
max_iter=2,
allow_delegation=False,
llm="claude-opus-4-6",
)
# ── Tasks ────────────────────────────────────────────────────
research_task = Task(
description=(
f"Conduct comprehensive competitive intelligence research on {company}'s "
f"competitive landscape in the {market} market.\n\n"
"Gather the following for each of the top 3-5 competitors:\n"
"1. Company overview: founding year, HQ, team size (approximate)\n"
"2. Core product/service and target customer segments\n"
"3. Pricing model and price points (if publicly available)\n"
"4. Funding history: rounds, amounts, investors\n"
"5. Recent news: product launches, partnerships, leadership changes (last 90 days)\n"
"6. Differentiation claims: how do they position against each other?\n\n"
"Also gather:\n"
"- Overall market size estimates and growth projections\n"
"- Key industry trends affecting competitive dynamics\n"
"- Any customer reviews or sentiment data you can find\n\n"
"Flag anything you cannot find or verify. Note the date of each data point."
),
expected_output=(
"Structured research report with:\n"
"- Market overview (size, growth, key trends)\n"
"- Competitor profiles (one section per competitor, structured consistently)\n"
"- Pricing comparison table\n"
"- Recent news timeline\n"
"- Data confidence notes (what was verified vs inferred)"
),
agent=researcher,
)
analysis_task = Task(
description=(
f"Analyze the competitive intelligence research to identify strategic implications "
f"for {company}.\n\n"
"Provide:\n"
"1. Competitive position assessment: Where does {company} stand relative to competitors?\n"
"2. Key differentiators: What does {company} do better? Worse?\n"
"3. Threat analysis: Which competitors pose the greatest risk and why?\n"
"4. Opportunity identification: What gaps exist that {company} could fill?\n"
"5. Strategic recommendations: 3-5 specific, actionable recommendations\n\n"
"Distinguish clearly between facts (from research) and inferences (your analysis).\n"
"Use specific evidence to support each strategic point."
),
expected_output=(
"Strategic analysis with:\n"
"- Competitive position matrix\n"
"- Threat prioritization (high/medium/low with rationale)\n"
"- Opportunity assessment\n"
"- Strategic recommendations (each with: what, why, expected impact)"
),
agent=analyst,
context=[research_task],
)
writing_task = Task(
description=(
f"Write an executive intelligence brief on {company}'s competitive landscape "
f"in the {market} market.\n\n"
"Requirements:\n"
"- Total length: 600-800 words\n"
"- Executive summary: 5-6 sentences capturing the most critical intelligence\n"
"- Key findings: 3-4 bullet sections, each with 2-3 supporting facts\n"
"- Strategic implications: what this means for {company}\n"
"- Recommended actions: 3 specific actions with expected impact\n\n"
"Style requirements:\n"
"- Every sentence must add information value\n"
"- Use specific numbers, not vague descriptors\n"
"- Avoid: 'it is worth noting', 'importantly', 'it should be mentioned'\n"
"- Write for a C-suite executive with limited time and technical depth"
),
expected_output=(
"A polished intelligence brief in the specified format:\n"
"- EXECUTIVE SUMMARY\n"
"- KEY COMPETITIVE FINDINGS\n"
"- STRATEGIC IMPLICATIONS\n"
"- RECOMMENDED ACTIONS\n"
"- INTELLIGENCE NOTES (sources, confidence, dates)"
),
agent=writer,
context=[research_task, analysis_task],
output_file=output_file or f"intel_brief_{company.lower().replace(' ', '_')}.md",
)
review_task = Task(
description=(
"Review the intelligence brief for accuracy, clarity, and actionability.\n\n"
"Check specifically:\n"
"1. Are all specific claims (numbers, dates, percentages) supported by the research?\n"
"2. Are recommendations specific enough to act on?\n"
"3. Is the executive summary accurate and complete?\n"
"4. Are there any unsupported inferences presented as facts?\n"
"5. Is the writing tight? Any unnecessary padding?\n\n"
"If the brief meets standards: approve it and provide a one-sentence sign-off.\n"
"If it needs revisions: provide a specific list of corrections (no vague feedback)."
),
expected_output=(
"Either:\n"
"- APPROVED: [one-sentence reason why this meets quality standards]\n"
"- REVISIONS REQUIRED: [numbered list of specific corrections needed]"
),
agent=reviewer,
context=[writing_task],
)
# ── Crew ──────────────────────────────────────────────────────
crew = Crew(
agents=[researcher, analyst, writer, reviewer],
tasks=[research_task, analysis_task, writing_task, review_task],
process=Process.sequential,
verbose=True,
memory=True, # Enable cross-task memory
embedder={ # Configure embedding model for memory
"provider": "anthropic",
"config": {
"model": "voyage-2", # Anthropic's embedding model
}
},
max_rpm=10, # Rate limit to avoid API throttling
share_crew=False, # Do not share crew data with CrewAI
)
# ── Run ──────────────────────────────────────────────────────
print(f"\n{'='*60}")
print(f"COMPETITIVE INTELLIGENCE CREW")
print(f"Company: {company} | Market: {market}")
print(f"{'='*60}\n")
result = crew.kickoff(inputs={
"company": company,
"market": market,
})
return str(result)
def demo():
"""Run the competitive intelligence crew on a sample task."""
result = build_competitive_intelligence_crew(
company="Acme Analytics",
market="enterprise data analytics",
output_file="acme_intel_brief.md",
)
print("\n\n=== FINAL INTELLIGENCE BRIEF ===")
print(result[:2000])
if __name__ == "__main__":
demo()
Hierarchical Process
The hierarchical process adds a manager agent that dynamically coordinates the crew - assigning tasks, checking outputs, and re-delegating if needed.
from crewai import Crew, Process, Agent
# Manager agent (no tasks assigned - it directs other agents)
manager = Agent(
role="Project Manager",
goal="Coordinate the team to deliver high-quality results efficiently",
backstory=(
"You are an experienced project manager who knows when to delegate, "
"when to course-correct, and when to accept good-enough over perfect."
),
llm="claude-opus-4-6",
allow_delegation=True, # Can delegate tasks to other agents
)
hierarchical_crew = Crew(
agents=[researcher, analyst, writer], # Specialist agents
tasks=[research_task, analysis_task, writing_task],
process=Process.hierarchical,
manager_agent=manager, # Or use manager_llm to auto-create
verbose=True,
)
result = hierarchical_crew.kickoff(inputs={"topic": "AI in healthcare"})
Tool Integration Patterns
CrewAI agents can use three types of tools:
Built-in CrewAI tools:
from crewai_tools import (
SerperDevTool, # Web search via Serper
WebsiteSearchTool, # Read specific URLs
FileReadTool, # Read local files
DirectoryReadTool, # List directory contents
CodeInterpreterTool, # Execute Python code
CSVSearchTool, # Search CSV files
PDFSearchTool, # Search PDF documents
YoutubeVideoSearchTool,# Search YouTube transcripts
)
LangChain tools (compatible via wrapper):
from langchain.tools import DuckDuckGoSearchRun
from crewai_tools import LangChainToolAdapter
ddg_search = LangChainToolAdapter(tool=DuckDuckGoSearchRun())
Custom tools:
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
import anthropic
class DatabaseQueryInput(BaseModel):
"""Input schema for the database query tool."""
query: str = Field(description="SQL query to execute against the analytics database")
database: str = Field(default="prod", description="Database to query: prod, staging, or dev")
class DatabaseQueryTool(BaseTool):
"""Query the internal analytics database."""
name: str = "database_query"
description: str = (
"Query the internal analytics database using SQL. "
"Use for: user metrics, revenue data, product usage statistics. "
"Database contains data updated daily."
)
args_schema: type[BaseModel] = DatabaseQueryInput
def _run(self, query: str, database: str = "prod") -> str:
"""Execute the query and return results."""
# In production: connect to your actual database
# This is a mock implementation
if "monthly_active_users" in query.lower():
return "Monthly active users: 47,832 (as of 2025-03-01)"
elif "revenue" in query.lower():
return "Monthly recurring revenue: $2.4M (as of 2025-03-01)"
else:
return f"Query executed against {database}: No results matching '{query}'"
# Use in an agent
data_agent = Agent(
role="Data Analyst",
goal="Query the analytics database to support intelligence reports with internal metrics",
backstory="You are a data analyst who knows the internal database schema and can extract key metrics.",
tools=[DatabaseQueryTool()],
llm="claude-opus-4-6",
)
Observability and Callbacks
Monitor what your crew is doing in production:
from crewai import Crew
from crewai.callbacks import TaskOutput
def task_callback(task_output: TaskOutput) -> None:
"""Called after each task completes."""
print(f"\n[Callback] Task completed:")
print(f" Task: {task_output.description[:60]}...")
print(f" Agent: {task_output.agent}")
print(f" Output preview: {str(task_output.raw)[:200]}...")
# In production: log to your monitoring system
# send_to_datadog(task_output)
# send_to_langsmith(task_output)
monitored_crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, writing_task],
process=Process.sequential,
task_callback=task_callback, # Per-task callback
verbose=True,
)
:::danger Agent Delegation Security
When allow_delegation=True, an agent can instruct other agents to perform tasks. A compromised or manipulated agent could potentially instruct other agents to take harmful actions. In production, use allow_delegation=False for most agents and only enable it for deliberately-designed manager agents. Never allow delegation to agents with broad tool access (file write, code execution, API calls) from agents that handle untrusted user input.
:::
:::warning Task Output Chaining Quality Degradation In sequential process, each task receives the previous task's output as context. If task 1 produces poor output, tasks 2, 3, and 4 all compound that error. This "garbage in, garbage out" amplification means the quality of your first task (usually research) determines quality across the entire pipeline. Invest heavily in the first task's description and expected_output spec. Consider adding a dedicated validation task after the first task that checks whether the research meets quality standards before passing it downstream. :::
Interview Questions and Answers
Q: How does CrewAI's agent backstory affect actual agent behavior, and is it more than a gimmick?
A: It is more than a gimmick, but it is not magic. LLMs are highly sensitive to framing. When you tell an agent it is a "15-year veteran who distinguishes speculation from confirmed facts," several things happen: (1) The model activates related concepts from its training - careful analysis, epistemic hedging, source citation - as contextually appropriate. (2) The persona provides a consistent reference frame for every response, reducing variance in output style and approach. (3) The backstory implicitly sets expectations for what "good work" looks like for that role, which influences self-evaluation. Empirically, teams building CrewAI systems consistently find that specific, well-crafted backstories produce more reliable role-consistent behavior than generic descriptions. The improvement is measurable in output quality but difficult to quantify precisely. Best practice: write backstories that are specific, reference-relevant expertise, and include personality traits that shape how the agent handles uncertainty ("I flag what I cannot verify") rather than generic positive descriptors.
Q: When would you choose CrewAI's hierarchical process over sequential process?
A: Sequential process is appropriate when the workflow is predictable and linear: task A must always come before task B, task B before task C. Research → analysis → writing → review is a canonical sequential workflow. Hierarchical process adds a manager agent that dynamically assigns tasks based on context - appropriate when: (1) The optimal task ordering is not known in advance and depends on what earlier tasks discover. (2) Some tasks may need to be skipped, repeated, or reassigned based on quality issues. (3) Multiple agents have overlapping capabilities and the manager should select the best one for each sub-task dynamically. (4) The task involves genuinely open-ended exploration where the path through the work cannot be predetermined. In practice, sequential process works for 80% of use cases and is more predictable and debuggable. Hierarchical is worth the added complexity only when dynamic task routing genuinely benefits the workflow.
Q: How does CrewAI Flows differ from the Crew+Tasks paradigm and when would you use it?
A: The Crew+Tasks paradigm is designed for structured, linear (or manager-directed) workflows where you know the tasks upfront. Flows are designed for event-driven, conditional workflows where what happens next depends on intermediate results. In Flows, methods are triggered by events (decorators like @listen, @router) rather than sequential task ordering. The @router decorator enables branching: "if quality is high, approve; if quality is low but revisions remain, revise; if max revisions reached, force approve." This revision loop is awkward or impossible to express in the sequential task paradigm. Use Flows when your workflow has: conditional branching (different paths based on output quality or content), loops (retry/revision patterns), event-driven triggers (respond to webhooks, user input, or time events), or complex state that needs to persist across multiple steps.
Q: How would you implement a CrewAI system that handles errors gracefully without crashing?
A: Multiple layers of error handling. (1) Agent-level retry: configure max_iter appropriately - agents can retry tool calls multiple times before giving up. Higher max_iter allows recovery from transient failures but risks infinite loops. (2) Task-level fallbacks: use human_input=True on critical tasks to pause and get human input when the agent is uncertain rather than failing or hallucinating. (3) Crew-level exception handling: wrap crew.kickoff() in try/except and implement fallback logic (partial results, simplified output, graceful degradation). (4) Tool-level resilience: custom tools should catch exceptions and return descriptive error messages rather than raising - the agent needs to see "Database query failed: connection timeout" to handle it gracefully. (5) Output validation: after task completion, validate that the output meets minimum quality criteria before passing to the next task. If it doesn't, inject a correction task. (6) Monitoring and alerting: instrument your crew with callbacks to detect failure patterns early - recurring task failures, loops, or consistently low-quality outputs are signals of systemic issues.
Q: How do you tune CrewAI for cost efficiency in production?
A: Several strategies. (1) Right-size the model per agent: not every agent needs the most capable (and expensive) model. Researchers and writers benefit from Claude Opus. Simple validators, formatters, or routing agents can use Claude Haiku at 30× lower cost per token. (2) Minimize max_iter: the default maximum tool call iterations is generous. Most well-designed tasks complete in 3–5 iterations. Set max_iter=4 for routine agents and increase only when tasks genuinely require more iterations. (3) Tight expected_output specs: vague expected output leads to verbose responses and re-tries. Specific expected output specs (length, format, exact structure) reduce token waste. (4) Tool call efficiency: agents with access to many tools often try multiple tools for the same information. Give each agent only the tools it actually needs. (5) Caching: if the same external queries (web searches, database queries) will be repeated across crew runs, add a cache layer to avoid redundant API calls. (6) Sequential over hierarchical where possible: hierarchical process requires additional manager LLM calls per task. For predictable workflows, sequential is cheaper.
