When to Use a Framework
Reading time: 24 min | Relevance: ML Engineers, Architects, Senior Developers building agent systems
The Framework Paradox
LangChain has 90,000 GitHub stars and a reputation for being over-engineered and impossible to debug. Both things are true simultaneously.
LangChain's star count reflects genuine utility: for a certain class of problems - ones with many integrations, complex chains, and teams that need to move fast - it accelerates development meaningfully. The debugging reputation also reflects genuine experience: when it breaks (and it will break), the error messages point into framework internals rather than your code, and the fix requires understanding abstractions you did not write.
This is the framework paradox: the abstraction that saves you time on day one costs you time on day thirty. The question is not whether frameworks are good or bad - it is whether the abstraction fits your problem well enough that the day-thirty cost stays below the day-one benefit.
Most teams make the framework decision emotionally (everyone uses LangChain) or by social proof (it has 90k stars). This lesson provides a decision framework based on your actual requirements. Sometimes raw API is the right answer. Sometimes LangGraph or CrewAI is obviously correct. The skill is knowing which situation you are in.
:::tip 🎮 Interactive Playground Visualize this concept: Try the Agent Frameworks demo on the EngineersOfAI Playground - no code required. :::
Why This Exists
Before LangChain (2022), building agents meant writing all the scaffolding yourself: the messages array management, the tool call loop, the memory system, the LLM abstraction layer. Teams at different companies were all writing the same code. LangChain's initial proposition was correct: there is enough common infrastructure here to warrant a shared library.
The problem emerged as the framework grew. LangChain evolved from a simple utility library into a full abstraction platform with its own expression language (LCEL), its own agent abstraction, its own memory system, and its own integration ecosystem. Each new abstraction added power but also opacity. By 2024, teams were spending more time understanding LangChain internals than building their actual applications.
The ecosystem responded with alternatives: LangGraph (from the same team, but explicit state graphs), LlamaIndex (data-focused), CrewAI (multi-agent teams), AutoGen (conversational agents). Each solves a real problem. Each also adds complexity.
The mature answer in 2025 is: start with raw API, add framework abstractions only when you have identified a specific problem they solve, and always maintain the ability to drop back to raw API when the abstraction fails.
What Frameworks Provide
A framework can provide some or all of:
| Benefit | What it means | Which frameworks |
|---|---|---|
| LLM abstraction | Same code for OpenAI, Anthropic, Cohere | LangChain, LlamaIndex |
| Tool/integration library | Pre-built connectors to 100+ APIs | LangChain |
| Observability/tracing | Built-in execution visibility | LangSmith, LlamaTrace |
| State management | Persistent agent state across steps | LangGraph |
| Graph-based control flow | Conditional routing, cycles, checkpoints | LangGraph |
| Multi-agent orchestration | Agents that coordinate with each other | CrewAI, AutoGen, LangGraph |
| Data ingestion pipeline | Document loading, indexing, retrieval | LlamaIndex |
| Async/streaming support | First-class async and token streaming | LangGraph, raw API |
| Community & support | Stack Overflow answers, GitHub issues | LangChain (largest) |
What Frameworks Cost
Each benefit comes with a cost that is often invisible until you are in production:
| Cost | What it means | Severity |
|---|---|---|
| Complexity overhead | More code paths, more things that can go wrong | Medium |
| Abstraction leakage | When it breaks, you debug the framework not your code | High |
| Version churn | LangChain had 5+ breaking API changes in 2023 | High |
| Debugging difficulty | Stack traces are 20 frames deep into framework code | High |
| Performance overhead | Every abstraction layer adds latency | Low–Medium |
| Lock-in | Migration to a different framework or raw API is costly | Medium |
| Cognitive overhead | Your team must understand the framework in addition to the problem | Medium |
| Undocumented behavior | Framework behavior that differs from documentation | Medium |
The Raw API Approach
Building directly on the Anthropic API gives you maximum control with minimal dependencies:
import anthropic
import json
from typing import Any
class RawAPIAgent:
"""
A complete, production-capable agent built directly on the Anthropic API.
No framework dependencies. Full control over every aspect of behavior.
Easy to debug: every step is explicit Python code you wrote.
"""
def __init__(self, system_prompt: str, tools: list[dict]):
self.client = anthropic.Anthropic()
self.system_prompt = system_prompt
self.tools = tools
self.tool_implementations: dict[str, callable] = {}
self.messages: list[dict] = []
def register_tool(self, name: str, fn: callable):
"""Register a Python function as an available tool."""
self.tool_implementations[name] = fn
def run(self, user_message: str, max_steps: int = 20) -> str:
"""Run the agent until completion or max_steps."""
self.messages.append({"role": "user", "content": user_message})
for step in range(max_steps):
response = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
system=self.system_prompt,
tools=self.tools,
messages=self.messages,
)
self.messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return self._extract_final_text(response)
if response.stop_reason != "tool_use":
break
tool_results = self._execute_tools(response.content)
self.messages.append({"role": "user", "content": tool_results})
return "Max steps reached without completion."
def _execute_tools(self, content_blocks: list) -> list[dict]:
results = []
for block in content_blocks:
if block.type != "tool_use":
continue
tool_fn = self.tool_implementations.get(block.name)
if not tool_fn:
result_content = f"Tool '{block.name}' not found"
else:
try:
result_content = str(tool_fn(**block.input))
except Exception as e:
result_content = f"Tool error: {type(e).__name__}: {e}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result_content,
})
return results
def _extract_final_text(self, response) -> str:
for block in response.content:
if hasattr(block, "text"):
return block.text
return ""
The entire agentic loop in ~60 lines of code you wrote and understand completely. Every step is transparent. When something goes wrong, the error points to your code.
Decision Criteria: The 5 Questions
The framework decision reduces to five questions asked in order:
The Same Agent Built Three Ways
The best way to understand the framework tradeoffs is to build the same thing three ways and compare:
# ==========================================================
# APPROACH 1: Raw Anthropic API - full control, no magic
# ==========================================================
import anthropic
import requests
from bs4 import BeautifulSoup
def search_web(query: str, max_results: int = 3) -> str:
"""Simulated web search (replace with real search API)."""
return f"Search results for '{query}': [result 1], [result 2], [result 3]"
def fetch_page(url: str) -> str:
"""Fetch and parse a web page."""
try:
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.text, "html.parser")
return soup.get_text()[:3000]
except Exception as e:
return f"Error fetching {url}: {e}"
def summarize(text: str) -> str:
"""Summarize text (simplified)."""
return text[:500] + "..." if len(text) > 500 else text
RAW_TOOLS = [
{
"name": "search_web",
"description": "Search the web for information on a query",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"max_results": {"type": "integer", "description": "Number of results", "default": 3},
},
"required": ["query"],
},
},
{
"name": "fetch_page",
"description": "Fetch and read the content of a web page",
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL to fetch"},
},
"required": ["url"],
},
},
{
"name": "summarize",
"description": "Summarize a long piece of text",
"input_schema": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to summarize"},
},
"required": ["text"],
},
},
]
def run_raw_api_agent(task: str) -> str:
"""Research agent: raw API, no dependencies."""
client = anthropic.Anthropic()
tool_fns = {"search_web": search_web, "fetch_page": fetch_page, "summarize": summarize}
messages = [{"role": "user", "content": task}]
for _ in range(15):
resp = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
system="You are a research assistant. Search, fetch, and summarize information.",
tools=RAW_TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
for block in resp.content:
if hasattr(block, "text"):
return block.text
return ""
tool_results = []
for block in resp.content:
if block.type == "tool_use":
fn = tool_fns.get(block.name, lambda **kw: "Tool not found")
try:
result = fn(**block.input)
except Exception as e:
result = f"Error: {e}"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
return "Max steps reached"
# ==========================================================
# APPROACH 2: LangChain - integrations, LCEL, tracing
# Requires: pip install langchain langchain-anthropic
# ==========================================================
from langchain_anthropic import ChatAnthropic
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool as lc_tool
from langchain_core.prompts import ChatPromptTemplate
@lc_tool
def lc_search_web(query: str) -> str:
"""Search the web for information."""
return search_web(query)
@lc_tool
def lc_fetch_page(url: str) -> str:
"""Fetch the content of a web page."""
return fetch_page(url)
@lc_tool
def lc_summarize(text: str) -> str:
"""Summarize a piece of text."""
return summarize(text)
def run_langchain_agent(task: str) -> str:
"""Research agent: LangChain with LCEL and LangSmith tracing."""
llm = ChatAnthropic(model="claude-opus-4-6")
tools = [lc_search_web, lc_fetch_page, lc_summarize]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a research assistant. Search, fetch, and summarize information."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=15)
result = executor.invoke({"input": task})
return result["output"]
# Benefit: verbose=True shows every step; LangSmith traces automatically if configured
# Cost: hidden AgentExecutor logic; harder to customize the loop
# ==========================================================
# APPROACH 3: LangGraph - explicit state, better debugging
# Requires: pip install langgraph langchain-anthropic
# ==========================================================
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import HumanMessage
from typing import TypedDict, Annotated
import operator
class ResearchState(TypedDict):
messages: Annotated[list, operator.add] # Append-only message list
def run_langgraph_agent(task: str) -> str:
"""Research agent: LangGraph with explicit state and conditional edges."""
llm = ChatAnthropic(model="claude-opus-4-6")
tools = [lc_search_web, lc_fetch_page, lc_summarize]
llm_with_tools = llm.bind_tools(tools)
def agent_node(state: ResearchState):
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: ResearchState) -> str:
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return END
graph = StateGraph(ResearchState)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools))
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")
compiled = graph.compile()
result = compiled.invoke({"messages": [HumanMessage(content=task)]})
final = result["messages"][-1]
return final.content if hasattr(final, "content") else str(final)
# Benefit: graph is explicit and visualizable; state is typed; easy to add checkpointing
# Cost: more boilerplate than raw API for simple agents; requires LangGraph understanding
What the comparison reveals
| Dimension | Raw API | LangChain | LangGraph |
|---|---|---|---|
| Lines of code | ~60 | ~30 | ~45 |
| Debugging | Your code | Framework internals | Graph is explicit |
| Customization | Complete | Moderate | High |
| Streaming | Full control | Requires LCEL | Built-in |
| Checkpointing | Manual | Not built-in | First-class |
| Integrations | Write your own | 100+ available | Uses LangChain tools |
| Learning curve | Low | Medium | Medium-high |
Abstraction Leakage: When Frameworks Break
Every abstraction leaks eventually. The question is how badly and how often.
# Common LangChain leak: AgentExecutor's parsing of structured outputs
# When the LLM output doesn't match the expected format exactly:
#
# langchain.schema.output_parser.OutputParserException:
# Could not parse LLM output: `I need to think about this first...
# Action: search_web
# Action Input: {query: "AI safety"}`
#
# The fix requires understanding LangChain's output parsing internals:
# - You need to know about ReActOutputParser
# - You need to understand the exact format LangChain expects
# - You need to modify either the prompt or the parser
# - None of this is in your code; all of it is in framework code
# The same situation in raw API: it just works
# The LLM says "I need to think" and then calls a tool?
# Your loop handles it naturally because you wrote the loop.
# Example of debugging friction:
# LangChain error message:
# "AgentExecutor.run() got an unexpected keyword argument 'return_intermediate_steps'"
# Translation: API changed between versions. Which version? Check CHANGELOG...
# Raw API "error message": "response.stop_reason not in ['end_turn', 'tool_use']"
# Translation: exactly what it says. Handle it.
The Framework Tax
How much complexity does each framework add before you get any value?
# Framework tax = lines of framework-specific code / lines of your application code
# Raw API tax: 0 (everything is your code)
# Raw API setup: pip install anthropic (1 dependency)
# LangChain tax: ~30% of your code is framework plumbing
# LangChain setup: pip install langchain langchain-anthropic langchain-community
# → 50+ transitive dependencies
# → First breaking change expected in: 3 months
# LangGraph tax: ~25% of your code is graph definitions
# LangGraph benefit: the graph makes control flow explicit and testable
# LangGraph setup: pip install langgraph langchain-anthropic
# CrewAI tax: ~15% of your code is agent/task definitions
# CrewAI benefit: natural language role/goal definitions produce better behavior
# CrewAI setup: pip install crewai → 30+ dependencies
# The tax is worth paying when the framework-specific code would otherwise be custom
# framework code that you would write yourself anyway.
# LangGraph's StateGraph replaces a custom state management system you would build.
# CrewAI's role/goal system replaces a custom prompt engineering system you would build.
# LangChain's 100+ integrations replace connections you would build one by one.
Framework Churn Mitigation
The biggest risk with frameworks is API instability. LangChain had breaking changes in: 0.0.x → 0.1.x, 0.1.x → 0.2.x, the LCEL migration, the hub migration, and the agent API restructuring - all within 18 months.
# The thin adapter pattern: shield your application code from framework churn
from abc import ABC, abstractmethod
from typing import Any
class AgentBackend(ABC):
"""Abstract interface that your application code depends on."""
@abstractmethod
def run(self, task: str, context: dict) -> str:
pass
@abstractmethod
def run_with_tools(self, task: str, tools: list, context: dict) -> str:
pass
class AnthropicBackend(AgentBackend):
"""Raw API implementation - zero framework dependencies."""
def __init__(self):
import anthropic
self.client = anthropic.Anthropic()
def run(self, task: str, context: dict = None) -> str:
resp = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": task}],
)
return resp.content[0].text
def run_with_tools(self, task: str, tools: list, context: dict = None) -> str:
# Full raw API agent loop here
return RawAPIAgent("You are a helpful assistant.", tools).run(task)
class LangGraphBackend(AgentBackend):
"""LangGraph implementation - swap in when you need explicit state graphs."""
def run(self, task: str, context: dict = None) -> str:
return run_langgraph_agent(task)
def run_with_tools(self, task: str, tools: list, context: dict = None) -> str:
return run_langgraph_agent(task)
# Your application always calls AgentBackend, never the framework directly
# Swapping frameworks = swapping one line in your dependency injection setup
class ResearchApplication:
def __init__(self, backend: AgentBackend):
self.backend = backend
def research(self, topic: str) -> str:
return self.backend.run(f"Research the following topic: {topic}")
# Start with raw API
app = ResearchApplication(backend=AnthropicBackend())
# Migrate to LangGraph when you need state graphs - no application code changes
# app = ResearchApplication(backend=LangGraphBackend())
When to Use Each Option
| Situation | Recommended approach | Why |
|---|---|---|
| Simple agent, 3–5 tools | Raw API | No overhead, full control, easy to debug |
| Complex stateful workflow | LangGraph | Explicit state, checkpointing, conditional routing |
| Multi-agent teams | CrewAI or LangGraph | Purpose-built for coordination |
| Data-heavy RAG | LlamaIndex | 160+ data connectors, production retrieval |
| Many external integrations | LangChain | Largest integration library |
| Quick prototype | LangChain or CrewAI | Fastest to working demo |
| Production reliability critical | Raw API or LangGraph | Predictable, debuggable behavior |
| Learning agents | OpenAI Swarm or raw API | Simplest possible foundation |
| Enterprise/regulated environment | Raw API with thin adapters | Maximum control, minimum dependencies |
Production Notes
:::warning Never Couple Core Logic to Framework Internals
The most expensive migration cost comes from application logic embedded in framework abstractions. If your business logic lives inside LangChain's AgentExecutor.run() or a LangGraph node function, migration requires restructuring your application, not just changing a dependency. Keep business logic in plain Python functions; pass them to framework as callable tools or nodes.
:::
:::tip Prototype Fast, Refactor Before Launch Using LangChain for a week-long prototype is a legitimate strategy - the integration ecosystem is genuinely useful and the speed is real. The mistake is launching with the prototype code. Plan a refactoring sprint before production deployment: identify which framework abstractions you actually use, implement only those in your own code or in a thinner framework, and remove unused abstractions. :::
:::danger Beware the 50-Dependency Install
pip install langchain-community currently pulls in 50+ dependencies including database drivers, cloud SDKs, and web frameworks - many of which you will never use. In production environments with security scanning, each dependency is an attack surface. Consider installing only the specific langchain integration packages you need rather than the community bundle.
:::
Interview Questions
Q: How do you decide whether to use a framework or build on the raw API for an agent system?
A: I ask five questions in order: Does the problem require graph-based control flow with conditional routing and explicit state? If yes, LangGraph. Does it require multi-agent coordination? CrewAI or AutoGen. Does it require complex RAG with many data sources? LlamaIndex. Does it require many pre-built external integrations (Slack, Notion, databases)? LangChain. Otherwise - raw API. The default should be raw API, not a framework, because it gives full control, is easier to debug, has no version churn risk, and has zero additional dependencies. Frameworks earn their place only when they solve a problem you would otherwise have to build yourself.
Q: What does "abstraction leakage" mean in the context of LangChain?
A: Abstraction leakage is when a framework's internal implementation details bleed through its API surface, forcing you to understand the framework's internals to debug your application code. In LangChain, the most common leak point is the output parser: when the LLM produces output that is slightly off the expected format, LangChain throws an OutputParserException with an error message that requires understanding LangChain's internal parsing pipeline to fix. The same situation in raw API just works: you read the LLM response directly and handle it however your application logic requires. Every abstraction leaks eventually; the question is how frequently and how severely.
Q: How do you protect against framework version churn when building production agents?
A: The thin adapter pattern: define an abstract interface your application code depends on, and implement it with the framework in a separate module. Your business logic never imports from LangChain or LangGraph directly - it only imports from your interface. When the framework breaks its API, you fix the adapter, not your application. Additionally: pin framework versions in requirements.txt, run integration tests against the pinned version in CI, and allocate explicit time for framework version upgrades rather than treating them as incidental updates.
Q: When is LangGraph strictly better than LangChain's AgentExecutor?
A: LangGraph is better in four situations: (1) the agent needs to revisit earlier steps (cycles) - AgentExecutor is strictly sequential; (2) you need typed state that persists across steps - AgentExecutor has no first-class state schema; (3) you need checkpointing and resume - build a task that pauses waiting for human approval, then resumes; (4) you need human-in-the-loop interruption - interrupt_before=["node_name"] gives you a suspension point AgentExecutor cannot replicate. For a simple linear chain (system prompt → LLM → tool → LLM → output), AgentExecutor and raw API are both fine; LangGraph adds overhead without benefit.
Q: A team wants to use LangChain for their production agent because everyone else does. What would you tell them?
A: I would acknowledge that LangChain's adoption is genuine - the integration ecosystem and community are real advantages. Then I would ask: What specific LangChain features will you actually use? For most production agents, the answer is: the LLM abstraction and maybe 2–3 integrations. That is not enough to justify 50+ transitive dependencies and the API instability risk. I would propose building on raw API with a thin abstraction layer that makes it easy to add LangChain integrations selectively if needed. The social proof argument ("everyone uses LangChain") is not a technical reason - it is a risk avoidance signal that should be examined, not followed automatically.
