Skip to main content

CrewAI Multi-Agent Systems

The Content Team That Did Not Sleep

The VP of Marketing walks into the engineering standup with a familiar request: "We need to publish twelve detailed technical blog posts next month. We have outlines. We need research, writing, fact-checking, and SEO optimization for each one." The team looks at each other. Twelve posts at four to six hours of senior writer time each is not a one-month project - not with current headcount.

The senior engineer who has been quietly evaluating AI content pipelines speaks up. He has been testing a CrewAI setup for two weeks: a researcher agent that digs deep into each topic using search and synthesis, a writer agent that turns research into structured prose matching your brand voice, a fact-checker that cross-references claims against recent sources, and an SEO specialist that optimizes headlines and metadata. The crew runs sequentially - each agent hands off its work to the next. The human reviews and edits the final draft, which takes thirty minutes instead of four hours.

By the end of the week, he has the crew configured and documented. By the end of the month, twelve posts are published. The senior writers spent their time on the editing and strategy layer - the work that actually required their judgment - while the crew handled the research-and-draft layer.

This is CrewAI's value proposition: not replacing human judgment, but eliminating the mechanical work that precedes it. When you have a repeatable workflow that naturally decomposes into distinct roles - research, analysis, writing, review, validation - CrewAI provides an opinionated, ergonomic framework for implementing that workflow as a team of collaborating AI agents.

This lesson is a production-depth treatment of CrewAI. Not the getting-started tutorial, but the patterns, edge cases, and architectural decisions that matter when you are running crews in production.


:::tip 🎮 Interactive Playground Visualize this concept: Try the Agent Frameworks demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

The Coordination Problem

By early 2024, teams building multi-agent systems had two options: raw API code (complete control, a lot of scaffolding) or LangGraph (explicit state graphs, significant learning curve). What neither provided well was an opinionated model for human-like team coordination - agents with roles, goals, and expertise who collaborate on shared tasks.

Joao Moura shipped CrewAI in January 2024 with a different mental model: AI agents as team members. Each agent has a role (the function it plays), a goal (what it is optimizing for), and a backstory (context that shapes its behavior). Tasks have expected outputs, context (previous task results that this task builds on), and optional tool access. The Crew orchestrates execution - sequential by default, hierarchical when you need a manager to assign and route tasks dynamically.

The mental model resonated. Engineers who struggled to explain LangGraph's StateGraph to non-technical stakeholders found that CrewAI's role/goal/backstory model translated naturally into product discussions. "We have a researcher, a writer, and a reviewer" maps directly to three Agent objects. "The writer reads the researcher's output" maps to task context.

CrewAI grew to 25,000+ GitHub stars by mid-2024 and became one of the most-used multi-agent frameworks alongside LangGraph.

What CrewAI Is Not

CrewAI is not a general-purpose agent framework. It is optimized for structured, sequential or hierarchical team workflows. It is not the right choice for:

  • Complex conditional routing based on intermediate state (use LangGraph)
  • Single-agent systems with many tools (use raw API or LangGraph)
  • Real-time conversational agents (use AutoGen)
  • Heavy RAG pipelines (use LlamaIndex)

Use CrewAI when your problem naturally decomposes into roles and the execution flow is relatively linear (sequential) or when a manager can assign tasks dynamically (hierarchical).


Historical Context

DateEvent
January 2024CrewAI v0.1 - initial release, basic Agent/Task/Crew
March 2024Memory system added (entity, long-term, short-term)
May 2024Hierarchical process type with manager LLM
August 2024CrewAI Flows - structured workflow with event-driven steps
October 2024CrewAI+ (hosted platform, monitoring, deployment)
December 2024CrewAI v0.80+ - significant API refinements, async execution

Breaking change warning: CrewAI has had API changes between minor versions. The v0.28 to v0.30 migration changed the Crew initialization parameters. The v0.50+ versions refined memory API. Always pin your CrewAI version in production (crewai==0.80.0, not crewai>=0.80).


Core Architecture

The Agent

from crewai import Agent
from crewai_tools import SerperDevTool, ScrapeWebsiteTool

search_tool = SerperDevTool()
scrape_tool = ScrapeWebsiteTool()

research_agent = Agent(
role="Senior Research Analyst",
goal="""Produce comprehensive, accurate research on technical topics.
Find primary sources, extract key data points, and identify conflicting claims.""",
backstory="""You are a research analyst with 10 years of experience in
technical journalism. You have a reputation for finding the details others miss,
citing primary sources rather than secondary summaries, and flagging uncertainty
rather than guessing. You are methodical: search first, read deeply, then synthesize.""",
tools=[search_tool, scrape_tool],
llm="claude-opus-4-6", # Model for this agent's reasoning
verbose=True, # Log this agent's reasoning
allow_delegation=False, # This agent cannot delegate to others
max_iter=10, # Max reasoning iterations per task
memory=True # Enable agent-level memory
)

writer_agent = Agent(
role="Technical Content Writer",
goal="""Transform research into engaging, accurate technical blog posts
that match our brand voice: clear, direct, example-driven, never condescending.""",
backstory="""You are a senior technical writer who has worked at leading
developer tools companies. You write for engineers who value precision over
fluff. You never use buzzwords without substance. Every claim you make is
backed by the research provided. You structure content with clear headers,
concrete examples, and code snippets where appropriate.""",
tools=[], # Writer does not need external tools - works from research context
llm="claude-opus-4-6",
verbose=True,
memory=True
)

editor_agent = Agent(
role="Technical Editor and Fact Checker",
goal="""Ensure factual accuracy, structural clarity, and brand voice consistency.
Flag any claims that need additional sourcing. Improve clarity without removing depth.""",
backstory="""You are a technical editor with a background in both engineering
and science journalism. You read everything with skepticism - you verify claims,
question unsupported assertions, and improve structure. You edit to clarify, not
to simplify. You never accept 'roughly' or 'approximately' when the actual number
is findable.""",
tools=[search_tool], # Can verify claims by searching
llm="claude-opus-4-6",
verbose=True,
memory=True
)

What makes a good backstory: the backstory is not decoration. It shapes the agent's behavior more than the role field. A research agent with a backstory about "methodical sourcing and flagging uncertainty" will behave differently than one with a backstory about "finding the most interesting angle." Test your backstories.

The Task

from crewai import Task

research_task = Task(
description="""Research the following topic comprehensively:

Topic: {topic}

Your research must include:
1. A factual overview of the current state (with dates and sources)
2. 3-5 specific technical examples or case studies with real numbers
3. The 2-3 most common misconceptions about this topic
4. Key expert opinions or research papers published in the last 12 months
5. Practical implications for engineers building with this technology

Do not summarize - collect specific details. Include direct quotes where relevant.
Flag any conflicting claims or areas of uncertainty.""",
expected_output="""A structured research report in markdown format containing:
- Executive Overview (2-3 paragraphs)
- Key Facts and Data Points (bulleted, with sources)
- Technical Examples (detailed, not surface-level)
- Misconceptions Section
- Recent Developments (last 12 months)
- Expert Perspectives
- Areas of Uncertainty

The report should be 1000-1500 words of dense, specific content.""",
agent=research_agent,
tools=[search_tool, scrape_tool] # Task-level tool override
)

writing_task = Task(
description="""Write a technical blog post based on the research provided.

The post should:
- Target senior engineers who value depth over brevity
- Open with a concrete, vivid scenario that establishes why this topic matters
- Use specific numbers and examples from the research
- Include at least one code example where appropriate
- Address the common misconceptions identified in the research
- Be 1500-2000 words

Tone: direct, precise, technically credible. No marketing language.""",
expected_output="""A complete technical blog post in markdown format with:
- Compelling title (not clickbait)
- 4-6 paragraph introduction with a concrete opening scenario
- Well-structured body with clear H2/H3 headers
- At least one code example with explanatory comments
- A section addressing common misconceptions
- Strong conclusion with actionable takeaways
- All claims traceable to the research provided""",
agent=writer_agent,
context=[research_task] # Writer sees the researcher's output
)

editing_task = Task(
description="""Edit and fact-check the blog post provided.

Your editing process:
1. Verify every factual claim against the research document
2. Flag any claims made without research backing
3. Improve structural clarity (reorganize sections if needed)
4. Strengthen the opening - it must hook engineers immediately
5. Check code examples for accuracy and completeness
6. Suggest 3 alternative titles
7. Add a meta description (160 chars max) for SEO

Do not simplify - our readers are engineers. Preserve depth, improve clarity.""",
expected_output="""The edited blog post with:
- All factual claims verified (or flagged as [VERIFY NEEDED])
- Improved opening paragraph
- Three alternative title suggestions
- SEO meta description
- Editor's notes in [EDITOR: ...] brackets for significant changes
- Final word count""",
agent=editor_agent,
context=[research_task, writing_task], # Editor sees both research and draft
human_input=False # Set to True to require human approval before continuing
)

Key task design principles:

  1. expected_output is not optional. Vague expected outputs produce vague results. Be specific about format, length, and what "done" means.

  2. context controls what prior task outputs the agent receives. Only include what the agent actually needs - long contexts with irrelevant information degrade quality.

  3. human_input=True pauses execution before the task runs, allowing a human to modify the description or inputs. Useful for tasks where quality is critical.

The Crew

from crewai import Crew, Process

content_crew = Crew(
agents=[research_agent, writer_agent, editor_agent],
tasks=[research_task, writing_task, editing_task],
process=Process.sequential, # Tasks run in order
verbose=True,
memory=True, # Enable crew-level shared memory
max_rpm=20, # Rate limit: max 20 LLM calls per minute
share_crew=False # Don't share with CrewAI's training data
)

# Run the crew
result = content_crew.kickoff(inputs={"topic": "LLM quantization techniques in 2025"})

print(result.raw) # The final task's output as a string
print(result.pydantic) # If the final task has output_pydantic defined
print(result.json_dict) # Parsed JSON if the output is JSON
print(result.token_usage) # Total token usage across all agents

# Access individual task outputs
for task_output in result.tasks_output:
print(f"Task: {task_output.name}")
print(f"Output: {task_output.raw[:200]}")

Process Types

Sequential (Default)

Tasks run in order. Each task's output is available as context to subsequent tasks that declare it in their context list. Simple, predictable, and the right choice for most pipelines.

crew = Crew(
agents=[agent1, agent2, agent3],
tasks=[task1, task2, task3],
process=Process.sequential
)
# Execution: task1 → task2 → task3
# task2 sees task1's output if context=[task1]
# task3 sees task1 and task2 outputs if context=[task1, task2]

Hierarchical

A manager LLM dynamically assigns tasks to agents and routes work based on task results. Use when the optimal agent assignment depends on the content of intermediate results:

from crewai import Crew, Process, Agent

manager = Agent(
role="Content Production Manager",
goal="Ensure all content meets quality standards and is delivered on time",
backstory="""You manage a team of content specialists. You assign tasks to the
right specialist based on their expertise and the task requirements. You monitor
quality and re-assign if needed.""",
llm="claude-opus-4-6",
allow_delegation=True # Manager must be able to delegate
)

crew = Crew(
agents=[research_agent, writer_agent, editor_agent],
tasks=[research_task, writing_task, editing_task],
process=Process.hierarchical,
manager_agent=manager, # Custom manager agent
# OR: manager_llm="claude-opus-4-6" # Use an LLM directly as manager
verbose=True
)

Hierarchical process is more expensive (extra LLM calls for manager reasoning) and less predictable than sequential. Use it only when dynamic routing genuinely improves outcomes.


Memory System

CrewAI's memory system is one of its strongest differentiators. It provides four distinct memory types that persist information across task executions.

from crewai import Crew, Process
from crewai.memory import (
ShortTermMemory,
LongTermMemory,
EntityMemory,
UserMemory
)
from crewai.memory.storage import LTMSQLiteStorage, RAGStorage
from langchain_anthropic import ChatAnthropic

# Setup embeddings for RAG-based memory
embedder_config = {
"provider": "huggingface",
"config": {"model": "BAAI/bge-small-en-v1.5"}
}

crew_with_memory = Crew(
agents=[research_agent, writer_agent],
tasks=[research_task, writing_task],
process=Process.sequential,
memory=True,

# Short-term: recent context within a run
short_term_memory=ShortTermMemory(
storage=RAGStorage(
embedder_config=embedder_config,
type="short_term"
)
),

# Long-term: persistent across runs (what has this crew learned?)
long_term_memory=LongTermMemory(
storage=LTMSQLiteStorage(db_path="./crew_memory.db")
),

# Entity memory: track people, companies, concepts across runs
entity_memory=EntityMemory(
storage=RAGStorage(
embedder_config=embedder_config,
type="entities"
)
),
verbose=True
)

Memory in practice: when a crew with memory runs a second time on a related topic, the agents receive relevant context from previous runs. A research agent that has previously researched "transformer attention mechanisms" will have that context available when researching "FlashAttention optimization" - reducing redundant research and improving coherence across related content pieces.


CrewAI Flows: Structured Pipelines

CrewAI Flows (introduced mid-2024) are a higher-level abstraction that combines Crews with structured Python control flow. A Flow is a class where methods decorated with @start and @listen define the execution DAG.

from crewai.flow.flow import Flow, listen, start, router
from pydantic import BaseModel

class ContentProductionState(BaseModel):
topic: str = ""
target_audience: str = "senior engineers"
research: str = ""
draft: str = ""
final_post: str = ""
quality_score: float = 0.0
approved: bool = False
revision_count: int = 0

class ContentProductionFlow(Flow[ContentProductionState]):

@start()
def receive_request(self):
"""Entry point - initialize state."""
print(f"Starting content production for: {self.state.topic}")
return self.state.topic

@listen(receive_request)
def run_research(self, topic):
"""Run the research crew."""
research_crew = Crew(
agents=[research_agent],
tasks=[Task(
description=f"Research: {topic}",
expected_output="Comprehensive research report",
agent=research_agent
)],
process=Process.sequential
)
result = research_crew.kickoff()
self.state.research = result.raw
return result.raw

@listen(run_research)
def run_writing(self, research):
"""Run the writing crew with research context."""
write_crew = Crew(
agents=[writer_agent],
tasks=[Task(
description=f"Write a blog post based on this research:\n{research[:2000]}",
expected_output="Complete blog post in markdown",
agent=writer_agent
)],
process=Process.sequential
)
result = write_crew.kickoff()
self.state.draft = result.raw
return result.raw

@router(run_writing)
def quality_check(self, draft):
"""Route based on draft quality."""
# Simple heuristic - in production, use an LLM judge
word_count = len(draft.split())
if word_count < 800:
return "needs_revision"
return "ready_for_editing"

@listen("needs_revision")
def revise_draft(self, draft):
"""Request revision if quality is insufficient."""
self.state.revision_count += 1
if self.state.revision_count >= 2:
print("Max revisions reached, proceeding anyway")
return draft

revise_crew = Crew(
agents=[writer_agent],
tasks=[Task(
description=f"""Expand and improve this draft (current: {len(draft.split())} words, need 1000+):
{draft}""",
expected_output="Improved draft with at least 1000 words",
agent=writer_agent
)]
)
result = revise_crew.kickoff()
self.state.draft = result.raw
return result.raw

@listen("ready_for_editing")
def run_editing(self, draft):
"""Run the editing crew."""
edit_crew = Crew(
agents=[editor_agent],
tasks=[Task(
description=f"Edit and fact-check:\n{draft}",
expected_output="Edited post with fact-check notes",
agent=editor_agent
)]
)
result = edit_crew.kickoff()
self.state.final_post = result.raw
return result.raw

# Run the flow
flow = ContentProductionFlow()
flow.state.topic = "The real cost of LLM token usage at scale"
result = flow.kickoff()

print(flow.state.final_post)
print(f"Revisions: {flow.state.revision_count}")

Flows give you Python control flow (conditionals, loops, branching) around Crew executions. They are more explicit than Crews alone and more ergonomic than LangGraph for CrewAI-native workflows.


Full Production Example: Content Research Crew

import os
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, ScrapeWebsiteTool, FileWriterTool

# ─── Initialize tools ─────────────────────────────────────────────────────────

os.environ["SERPER_API_KEY"] = "your-serper-key"

search = SerperDevTool(n_results=10)
scrape = ScrapeWebsiteTool()
write_file = FileWriterTool()

# ─── Agents ────────────────────────────────────────────────────────────────────

researcher = Agent(
role="Senior Technical Researcher",
goal="Find accurate, specific, and up-to-date technical information with verified sources",
backstory="""10-year veteran of technical research at top AI labs. Known for
sourcing primary research rather than blog summaries. Always flags uncertainty.
Never presents speculation as fact.""",
tools=[search, scrape],
llm="claude-opus-4-6",
verbose=True,
max_iter=15,
memory=True
)

writer = Agent(
role="Technical Content Strategist",
goal="Transform dense research into compelling, accurate technical content for engineers",
backstory="""Former engineer turned technical writer. Deep understanding of
what engineers care about: correctness, examples, practical implications. Writes
the way a brilliant senior colleague explains things - clear, direct, no fluff.""",
tools=[],
llm="claude-opus-4-6",
verbose=True,
memory=True
)

seo_specialist = Agent(
role="Technical SEO Specialist",
goal="Maximize content discoverability without compromising technical accuracy",
backstory="""Specializes in technical content SEO. Knows that engineers
use different search terms than general audiences. Focuses on long-tail technical
queries, proper heading structure, and metadata that matches real search intent.""",
tools=[search], # Can verify search volume for terms
llm="claude-opus-4-6",
verbose=True
)

# ─── Tasks ─────────────────────────────────────────────────────────────────────

research_task = Task(
description="""Research '{topic}' for a technical blog post targeting senior software engineers.

Find and compile:
1. Current state-of-the-art as of {current_date}
2. Key papers, benchmarks, or case studies with real numbers
3. Common implementation pitfalls (not theory - actual mistakes people make)
4. 2-3 production examples from real companies (named if publicly known)
5. The 3 questions every engineer asks about this topic

Search multiple sources. Read the actual content, not just headlines.
Prioritize sources from the last 6 months.""",
expected_output="""Research report (1000-1500 words) with:
- Factual overview with timeline
- Specific data points with source URLs
- Implementation pitfalls section
- Production examples section
- Engineer FAQ section
- Source bibliography""",
agent=researcher
)

writing_task = Task(
description="""Write a technical blog post based on the research.

Requirements:
- 1500-2000 words
- Open with a concrete production scenario (not "In recent years...")
- Use specific numbers from the research
- Include a practical code example (Python, well-commented)
- Address the common pitfalls directly
- End with 3 actionable takeaways

Target reader: senior engineer who is evaluating whether to use this technology.
They are skeptical and have seen hype before. Earn their trust with specifics.""",
expected_output="""Complete blog post in markdown:
- Title (hook, not clickbait)
- Opening scenario (3-4 paragraphs)
- 4-6 main sections with H2 headers
- 1+ code example
- Pitfalls section
- Actionable takeaways
- Word count: 1500-2000""",
agent=writer,
context=[research_task]
)

seo_task = Task(
description="""Optimize the blog post for search discovery without compromising accuracy.

Deliverables:
1. Primary keyword (high-intent, not overly competitive)
2. 5 secondary keywords to work in naturally
3. Optimized H1 title (target keyword near the front)
4. Meta description (155 chars max, includes primary keyword)
5. 5 suggested internal link anchor texts
6. One structural improvement for SEO (e.g., add a FAQ section)

Do not suggest changes that reduce technical accuracy. SEO serves the content; it
does not override it.""",
expected_output="""SEO analysis document:
- Primary keyword (with estimated intent)
- Secondary keywords list
- Optimized title (2 options)
- Meta description
- Internal link suggestions
- One structural recommendation""",
agent=seo_specialist,
context=[writing_task]
)

save_task = Task(
description="""Compile the final deliverable package.

Combine the written post with the SEO recommendations into a single document.
Format:
1. SEO metadata block at the top
2. The full blog post
3. Editor notes at the bottom (from the SEO analysis)

Save to: ./output/{topic_slug}_post.md""",
expected_output="Confirmation that the file was saved with the final word count",
agent=writer, # Writer compiles the final package
context=[writing_task, seo_task]
)

# ─── Crew ──────────────────────────────────────────────────────────────────────

content_crew = Crew(
agents=[researcher, writer, seo_specialist],
tasks=[research_task, writing_task, seo_task, save_task],
process=Process.sequential,
verbose=True,
memory=True,
max_rpm=25
)

# ─── Run ───────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
import datetime
result = content_crew.kickoff(inputs={
"topic": "LLM function calling in production",
"current_date": datetime.date.today().isoformat(),
"topic_slug": "llm-function-calling-production"
})

print("\n" + "="*60)
print("CREW COMPLETED")
print("="*60)
print(f"Token usage: {result.token_usage}")
print(f"\nFinal output preview:")
print(result.raw[:500])

Testing CrewAI Crews

import pytest
from unittest.mock import patch, MagicMock
from crewai import Crew, Agent, Task, Process

# Mock the LLM calls for unit testing
@pytest.fixture
def mock_llm_response():
with patch('crewai.agent.Agent._execute_core_agent_tasks') as mock:
mock.return_value = MagicMock(
output="Mocked agent output for testing",
tool_calls=[]
)
yield mock

def test_research_task_structure():
"""Verify task configuration is correct."""
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="A researcher",
llm="claude-opus-4-6"
)
task = Task(
description="Research {topic}",
expected_output="A research report",
agent=agent
)
assert "{topic}" in task.description
assert task.expected_output is not None
assert task.agent == agent

def test_crew_configuration():
"""Verify crew is configured with the right process."""
# ... crew setup ...
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential
)
assert crew.process == Process.sequential
assert len(crew.agents) == 2
assert len(crew.tasks) == 2

# Integration test with output validation
def test_crew_output_format(mock_llm_response):
"""Verify the crew produces output in the expected format."""
# Use a simple crew with mocked LLM
crew = content_crew
result = crew.kickoff(inputs={"topic": "test topic", "current_date": "2025-01-01"})
assert result.raw is not None
assert len(result.raw) > 100 # Non-trivial output

Production Engineering Notes

Cost Management

Crews can be expensive. Multiple agents × multiple LLM calls × long contexts adds up quickly:

# Track cost per crew run
crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, writing_task, editing_task],
process=Process.sequential,
# Cost control
max_rpm=10, # Rate limit API calls
)

result = crew.kickoff(inputs={"topic": "..."})

# Token usage breakdown
print(f"Total tokens: {result.token_usage.total_tokens}")
print(f"Prompt tokens: {result.token_usage.prompt_tokens}")
print(f"Completion tokens: {result.token_usage.completion_tokens}")
# At Claude Opus pricing (~$0.015/1K input, ~$0.075/1K output):
cost = (result.token_usage.prompt_tokens / 1000 * 0.015 +
result.token_usage.completion_tokens / 1000 * 0.075)
print(f"Estimated cost: ${cost:.3f}")

Async Execution

For running multiple crews concurrently:

import asyncio

async def run_crews_async(topics: list[str]) -> list:
"""Run multiple content crews in parallel."""
async def run_one(topic: str):
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential
)
return await crew.kickoff_async(inputs={"topic": topic})

results = await asyncio.gather(*[run_one(topic) for topic in topics])
return results

topics = ["LLM quantization", "RAG evaluation", "Agent safety"]
results = asyncio.run(run_crews_async(topics))
for topic, result in zip(topics, results):
print(f"{topic}: {len(result.raw)} chars")

:::danger Task Context Explosion

When you add many tasks to a crew's context list, the context passed to the agent grows proportionally. Ten tasks with 1000-word outputs each = 10,000 words of context per agent call - plus the agent's own task description. This creates two problems: cost (long contexts cost more) and quality degradation (models perform worse with very long, dense contexts).

Be selective with context. Only include the tasks whose output the current task actually needs. A writer needs the researcher's output. The editor needs both. The SEO specialist only needs the final draft - not the intermediate research.

:::

:::warning CrewAI Version Pinning Is Not Optional

CrewAI has had breaking API changes between minor versions. max_iter was renamed. Memory configuration changed. The Crew constructor parameters shifted. Running pip install crewai --upgrade on a production system without testing is risky.

Pin your version: crewai==0.80.0. Add CrewAI to your CI test suite. Before upgrading, read the full changelog and run your integration tests against the new version in staging.

:::


Interview Questions and Answers

Q1: What is the difference between CrewAI's sequential and hierarchical process types? When would you choose each?

Sequential process runs tasks in the order they are defined. Task 1 completes, its output is available to subsequent tasks that declare it in their context list, then Task 2 runs, and so on. This is predictable, debuggable, and the right choice when the task order is known in advance and does not depend on intermediate results.

Hierarchical process adds a manager layer: a manager LLM reads each task description and dynamically assigns it to the most appropriate agent. The manager can also re-assign tasks if the first attempt is unsatisfactory. Use hierarchical when: you cannot know in advance which agent should handle a task (because it depends on the content of prior results), or when you want the manager to quality-check each step and potentially request revisions before moving forward.

Hierarchical is more expensive - extra LLM calls for manager reasoning - and less predictable. Sequential is usually the right default. Upgrade to hierarchical only when dynamic routing provides measurable value.

Q2: How does CrewAI's memory system work, and what is the practical impact on agent behavior?

CrewAI provides four memory types. Short-term memory holds recent context within a single crew run - it is a RAG-indexed store of what has happened in the current execution. Long-term memory persists across runs using SQLite, accumulating knowledge the crew has built up over time. Entity memory specifically tracks people, organizations, products, and concepts mentioned across runs, providing entity-specific context when those entities appear again. User memory (via Mem0 integration) tracks user-specific preferences and history.

Practical impact: a research crew that has been running for two weeks on related AI topics will have accumulated long-term memory about key researchers, recent papers, and recurring themes. When a new task arrives about a related topic, agents receive this accumulated context, making their research more connected and avoiding redundant discovery. The risk is that incorrect information in long-term memory persists and influences future runs. Audit your crew's memory periodically, especially after correcting factual errors.

Q3: What is the role of expected_output in CrewAI tasks, and how does its quality affect agent behavior?

expected_output tells the agent what "done" looks like. It is not just documentation - the LLM reads it and uses it to determine when to stop and what format to produce. A vague expected output ("a good research report") produces vague outputs. A specific expected output ("A 1000-word report with these five sections, each containing at least three specific data points with source URLs") produces structured, verifiable output.

The expected output also serves as the quality criterion: if you compare task output against expected output as part of your evaluation pipeline, specific expected outputs make evaluation tractable. "Does this contain a bibliography?" is evaluable. "Is this good?" is not.

Common mistake: using expected_output as a summary of the task description rather than a spec for the output format. The task description says what to do. The expected output describes what the result should look like.

Q4: How would you debug a CrewAI crew where one agent consistently produces low-quality output?

First, isolate the failing agent. Run it on a simple task outside the crew context to test its base behavior. Is the output quality bad in isolation, or only when it runs in the crew? If bad in isolation, the problem is the agent's role, goal, backstory, or the task description and expected_output.

Second, check the context. If the agent runs fine in isolation but fails in the crew, the problem is often the context it receives. Add verbose=True to see exactly what messages the agent receives. A writer receiving 10,000 words of dense research context may produce worse output than one receiving a focused 2,000-word summary.

Third, check the expected output. Is it specific enough? If the expected output says "a good draft," the agent has no clear quality target.

Fourth, verify tool configuration. If the agent has tools it should not use, or is missing tools it needs, its behavior will be wrong.

Finally, add callbacks to log every LLM call with its full prompt. The full prompt the agent receives (system prompt + task description + context + tool definitions) is often much larger and less coherent than you expect when you read the agent definition in isolation.

Q5: When should you use CrewAI Flows instead of a raw Crew?

Use Flows when: you need Python control flow (if-statements, loops) around crew executions. A Flow can conditionally run different crews based on intermediate results, loop a crew until quality meets a threshold, or fan out to parallel crew executions and aggregate results.

The key scenarios: (1) quality-gated revision loops - run a writing crew, evaluate quality, re-run if below threshold (up to N times); (2) conditional routing - run a classification crew and then route to a specialized crew based on the classification; (3) parallel crews - run research crews on multiple topics simultaneously and synthesize results.

Flows are essentially "CrewAI Orchestration" - they manage the meta-level of which crews to run and when, using Python instead of another LLM layer. They are simpler than LangGraph for CrewAI-native use cases and more powerful than raw Crew composition. Use them when your workflow logic is complex enough that a single Crew with sequential tasks does not capture it cleanly.

© 2026 EngineersOfAI. All rights reserved.