Skip to main content

When to Use a Framework

Reading time: 24 min | Relevance: ML Engineers, Architects, Senior Developers building agent systems


The Framework Paradox

LangChain has 90,000 GitHub stars and a reputation for being over-engineered and impossible to debug. Both things are true simultaneously.

LangChain's star count reflects genuine utility: for a certain class of problems - ones with many integrations, complex chains, and teams that need to move fast - it accelerates development meaningfully. The debugging reputation also reflects genuine experience: when it breaks (and it will break), the error messages point into framework internals rather than your code, and the fix requires understanding abstractions you did not write.

This is the framework paradox: the abstraction that saves you time on day one costs you time on day thirty. The question is not whether frameworks are good or bad - it is whether the abstraction fits your problem well enough that the day-thirty cost stays below the day-one benefit.

Most teams make the framework decision emotionally (everyone uses LangChain) or by social proof (it has 90k stars). This lesson provides a decision framework based on your actual requirements. Sometimes raw API is the right answer. Sometimes LangGraph or CrewAI is obviously correct. The skill is knowing which situation you are in.


:::tip 🎮 Interactive Playground Visualize this concept: Try the Agent Frameworks demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

Before LangChain (2022), building agents meant writing all the scaffolding yourself: the messages array management, the tool call loop, the memory system, the LLM abstraction layer. Teams at different companies were all writing the same code. LangChain's initial proposition was correct: there is enough common infrastructure here to warrant a shared library.

The problem emerged as the framework grew. LangChain evolved from a simple utility library into a full abstraction platform with its own expression language (LCEL), its own agent abstraction, its own memory system, and its own integration ecosystem. Each new abstraction added power but also opacity. By 2024, teams were spending more time understanding LangChain internals than building their actual applications.

The ecosystem responded with alternatives: LangGraph (from the same team, but explicit state graphs), LlamaIndex (data-focused), CrewAI (multi-agent teams), AutoGen (conversational agents). Each solves a real problem. Each also adds complexity.

The mature answer in 2025 is: start with raw API, add framework abstractions only when you have identified a specific problem they solve, and always maintain the ability to drop back to raw API when the abstraction fails.


What Frameworks Provide

A framework can provide some or all of:

BenefitWhat it meansWhich frameworks
LLM abstractionSame code for OpenAI, Anthropic, CohereLangChain, LlamaIndex
Tool/integration libraryPre-built connectors to 100+ APIsLangChain
Observability/tracingBuilt-in execution visibilityLangSmith, LlamaTrace
State managementPersistent agent state across stepsLangGraph
Graph-based control flowConditional routing, cycles, checkpointsLangGraph
Multi-agent orchestrationAgents that coordinate with each otherCrewAI, AutoGen, LangGraph
Data ingestion pipelineDocument loading, indexing, retrievalLlamaIndex
Async/streaming supportFirst-class async and token streamingLangGraph, raw API
Community & supportStack Overflow answers, GitHub issuesLangChain (largest)

What Frameworks Cost

Each benefit comes with a cost that is often invisible until you are in production:

CostWhat it meansSeverity
Complexity overheadMore code paths, more things that can go wrongMedium
Abstraction leakageWhen it breaks, you debug the framework not your codeHigh
Version churnLangChain had 5+ breaking API changes in 2023High
Debugging difficultyStack traces are 20 frames deep into framework codeHigh
Performance overheadEvery abstraction layer adds latencyLow–Medium
Lock-inMigration to a different framework or raw API is costlyMedium
Cognitive overheadYour team must understand the framework in addition to the problemMedium
Undocumented behaviorFramework behavior that differs from documentationMedium

The Raw API Approach

Building directly on the Anthropic API gives you maximum control with minimal dependencies:

import anthropic
import json
from typing import Any

class RawAPIAgent:
"""
A complete, production-capable agent built directly on the Anthropic API.
No framework dependencies. Full control over every aspect of behavior.
Easy to debug: every step is explicit Python code you wrote.
"""

def __init__(self, system_prompt: str, tools: list[dict]):
self.client = anthropic.Anthropic()
self.system_prompt = system_prompt
self.tools = tools
self.tool_implementations: dict[str, callable] = {}
self.messages: list[dict] = []

def register_tool(self, name: str, fn: callable):
"""Register a Python function as an available tool."""
self.tool_implementations[name] = fn

def run(self, user_message: str, max_steps: int = 20) -> str:
"""Run the agent until completion or max_steps."""
self.messages.append({"role": "user", "content": user_message})

for step in range(max_steps):
response = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
system=self.system_prompt,
tools=self.tools,
messages=self.messages,
)

self.messages.append({"role": "assistant", "content": response.content})

if response.stop_reason == "end_turn":
return self._extract_final_text(response)

if response.stop_reason != "tool_use":
break

tool_results = self._execute_tools(response.content)
self.messages.append({"role": "user", "content": tool_results})

return "Max steps reached without completion."

def _execute_tools(self, content_blocks: list) -> list[dict]:
results = []
for block in content_blocks:
if block.type != "tool_use":
continue
tool_fn = self.tool_implementations.get(block.name)
if not tool_fn:
result_content = f"Tool '{block.name}' not found"
else:
try:
result_content = str(tool_fn(**block.input))
except Exception as e:
result_content = f"Tool error: {type(e).__name__}: {e}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result_content,
})
return results

def _extract_final_text(self, response) -> str:
for block in response.content:
if hasattr(block, "text"):
return block.text
return ""

The entire agentic loop in ~60 lines of code you wrote and understand completely. Every step is transparent. When something goes wrong, the error points to your code.


Decision Criteria: The 5 Questions

The framework decision reduces to five questions asked in order:


The Same Agent Built Three Ways

The best way to understand the framework tradeoffs is to build the same thing three ways and compare:

# ==========================================================
# APPROACH 1: Raw Anthropic API - full control, no magic
# ==========================================================

import anthropic
import requests
from bs4 import BeautifulSoup

def search_web(query: str, max_results: int = 3) -> str:
"""Simulated web search (replace with real search API)."""
return f"Search results for '{query}': [result 1], [result 2], [result 3]"

def fetch_page(url: str) -> str:
"""Fetch and parse a web page."""
try:
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.text, "html.parser")
return soup.get_text()[:3000]
except Exception as e:
return f"Error fetching {url}: {e}"

def summarize(text: str) -> str:
"""Summarize text (simplified)."""
return text[:500] + "..." if len(text) > 500 else text

RAW_TOOLS = [
{
"name": "search_web",
"description": "Search the web for information on a query",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"max_results": {"type": "integer", "description": "Number of results", "default": 3},
},
"required": ["query"],
},
},
{
"name": "fetch_page",
"description": "Fetch and read the content of a web page",
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL to fetch"},
},
"required": ["url"],
},
},
{
"name": "summarize",
"description": "Summarize a long piece of text",
"input_schema": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to summarize"},
},
"required": ["text"],
},
},
]

def run_raw_api_agent(task: str) -> str:
"""Research agent: raw API, no dependencies."""
client = anthropic.Anthropic()
tool_fns = {"search_web": search_web, "fetch_page": fetch_page, "summarize": summarize}
messages = [{"role": "user", "content": task}]

for _ in range(15):
resp = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
system="You are a research assistant. Search, fetch, and summarize information.",
tools=RAW_TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})

if resp.stop_reason == "end_turn":
for block in resp.content:
if hasattr(block, "text"):
return block.text
return ""

tool_results = []
for block in resp.content:
if block.type == "tool_use":
fn = tool_fns.get(block.name, lambda **kw: "Tool not found")
try:
result = fn(**block.input)
except Exception as e:
result = f"Error: {e}"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})

return "Max steps reached"


# ==========================================================
# APPROACH 2: LangChain - integrations, LCEL, tracing
# Requires: pip install langchain langchain-anthropic
# ==========================================================

from langchain_anthropic import ChatAnthropic
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool as lc_tool
from langchain_core.prompts import ChatPromptTemplate

@lc_tool
def lc_search_web(query: str) -> str:
"""Search the web for information."""
return search_web(query)

@lc_tool
def lc_fetch_page(url: str) -> str:
"""Fetch the content of a web page."""
return fetch_page(url)

@lc_tool
def lc_summarize(text: str) -> str:
"""Summarize a piece of text."""
return summarize(text)

def run_langchain_agent(task: str) -> str:
"""Research agent: LangChain with LCEL and LangSmith tracing."""
llm = ChatAnthropic(model="claude-opus-4-6")
tools = [lc_search_web, lc_fetch_page, lc_summarize]

prompt = ChatPromptTemplate.from_messages([
("system", "You are a research assistant. Search, fetch, and summarize information."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=15)

result = executor.invoke({"input": task})
return result["output"]
# Benefit: verbose=True shows every step; LangSmith traces automatically if configured
# Cost: hidden AgentExecutor logic; harder to customize the loop


# ==========================================================
# APPROACH 3: LangGraph - explicit state, better debugging
# Requires: pip install langgraph langchain-anthropic
# ==========================================================

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import HumanMessage
from typing import TypedDict, Annotated
import operator

class ResearchState(TypedDict):
messages: Annotated[list, operator.add] # Append-only message list

def run_langgraph_agent(task: str) -> str:
"""Research agent: LangGraph with explicit state and conditional edges."""
llm = ChatAnthropic(model="claude-opus-4-6")
tools = [lc_search_web, lc_fetch_page, lc_summarize]
llm_with_tools = llm.bind_tools(tools)

def agent_node(state: ResearchState):
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}

def should_continue(state: ResearchState) -> str:
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return END

graph = StateGraph(ResearchState)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools))
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")
compiled = graph.compile()

result = compiled.invoke({"messages": [HumanMessage(content=task)]})
final = result["messages"][-1]
return final.content if hasattr(final, "content") else str(final)
# Benefit: graph is explicit and visualizable; state is typed; easy to add checkpointing
# Cost: more boilerplate than raw API for simple agents; requires LangGraph understanding

What the comparison reveals

DimensionRaw APILangChainLangGraph
Lines of code~60~30~45
DebuggingYour codeFramework internalsGraph is explicit
CustomizationCompleteModerateHigh
StreamingFull controlRequires LCELBuilt-in
CheckpointingManualNot built-inFirst-class
IntegrationsWrite your own100+ availableUses LangChain tools
Learning curveLowMediumMedium-high

Abstraction Leakage: When Frameworks Break

Every abstraction leaks eventually. The question is how badly and how often.

# Common LangChain leak: AgentExecutor's parsing of structured outputs
# When the LLM output doesn't match the expected format exactly:
#
# langchain.schema.output_parser.OutputParserException:
# Could not parse LLM output: `I need to think about this first...
# Action: search_web
# Action Input: {query: "AI safety"}`
#
# The fix requires understanding LangChain's output parsing internals:
# - You need to know about ReActOutputParser
# - You need to understand the exact format LangChain expects
# - You need to modify either the prompt or the parser
# - None of this is in your code; all of it is in framework code

# The same situation in raw API: it just works
# The LLM says "I need to think" and then calls a tool?
# Your loop handles it naturally because you wrote the loop.

# Example of debugging friction:
# LangChain error message:
# "AgentExecutor.run() got an unexpected keyword argument 'return_intermediate_steps'"
# Translation: API changed between versions. Which version? Check CHANGELOG...
# Raw API "error message": "response.stop_reason not in ['end_turn', 'tool_use']"
# Translation: exactly what it says. Handle it.

The Framework Tax

How much complexity does each framework add before you get any value?

# Framework tax = lines of framework-specific code / lines of your application code

# Raw API tax: 0 (everything is your code)
# Raw API setup: pip install anthropic (1 dependency)

# LangChain tax: ~30% of your code is framework plumbing
# LangChain setup: pip install langchain langchain-anthropic langchain-community
# → 50+ transitive dependencies
# → First breaking change expected in: 3 months

# LangGraph tax: ~25% of your code is graph definitions
# LangGraph benefit: the graph makes control flow explicit and testable
# LangGraph setup: pip install langgraph langchain-anthropic

# CrewAI tax: ~15% of your code is agent/task definitions
# CrewAI benefit: natural language role/goal definitions produce better behavior
# CrewAI setup: pip install crewai → 30+ dependencies

# The tax is worth paying when the framework-specific code would otherwise be custom
# framework code that you would write yourself anyway.
# LangGraph's StateGraph replaces a custom state management system you would build.
# CrewAI's role/goal system replaces a custom prompt engineering system you would build.
# LangChain's 100+ integrations replace connections you would build one by one.

Framework Churn Mitigation

The biggest risk with frameworks is API instability. LangChain had breaking changes in: 0.0.x → 0.1.x, 0.1.x → 0.2.x, the LCEL migration, the hub migration, and the agent API restructuring - all within 18 months.

# The thin adapter pattern: shield your application code from framework churn

from abc import ABC, abstractmethod
from typing import Any


class AgentBackend(ABC):
"""Abstract interface that your application code depends on."""

@abstractmethod
def run(self, task: str, context: dict) -> str:
pass

@abstractmethod
def run_with_tools(self, task: str, tools: list, context: dict) -> str:
pass


class AnthropicBackend(AgentBackend):
"""Raw API implementation - zero framework dependencies."""

def __init__(self):
import anthropic
self.client = anthropic.Anthropic()

def run(self, task: str, context: dict = None) -> str:
resp = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": task}],
)
return resp.content[0].text

def run_with_tools(self, task: str, tools: list, context: dict = None) -> str:
# Full raw API agent loop here
return RawAPIAgent("You are a helpful assistant.", tools).run(task)


class LangGraphBackend(AgentBackend):
"""LangGraph implementation - swap in when you need explicit state graphs."""

def run(self, task: str, context: dict = None) -> str:
return run_langgraph_agent(task)

def run_with_tools(self, task: str, tools: list, context: dict = None) -> str:
return run_langgraph_agent(task)


# Your application always calls AgentBackend, never the framework directly
# Swapping frameworks = swapping one line in your dependency injection setup
class ResearchApplication:
def __init__(self, backend: AgentBackend):
self.backend = backend

def research(self, topic: str) -> str:
return self.backend.run(f"Research the following topic: {topic}")


# Start with raw API
app = ResearchApplication(backend=AnthropicBackend())

# Migrate to LangGraph when you need state graphs - no application code changes
# app = ResearchApplication(backend=LangGraphBackend())

When to Use Each Option

SituationRecommended approachWhy
Simple agent, 3–5 toolsRaw APINo overhead, full control, easy to debug
Complex stateful workflowLangGraphExplicit state, checkpointing, conditional routing
Multi-agent teamsCrewAI or LangGraphPurpose-built for coordination
Data-heavy RAGLlamaIndex160+ data connectors, production retrieval
Many external integrationsLangChainLargest integration library
Quick prototypeLangChain or CrewAIFastest to working demo
Production reliability criticalRaw API or LangGraphPredictable, debuggable behavior
Learning agentsOpenAI Swarm or raw APISimplest possible foundation
Enterprise/regulated environmentRaw API with thin adaptersMaximum control, minimum dependencies

Production Notes

:::warning Never Couple Core Logic to Framework Internals The most expensive migration cost comes from application logic embedded in framework abstractions. If your business logic lives inside LangChain's AgentExecutor.run() or a LangGraph node function, migration requires restructuring your application, not just changing a dependency. Keep business logic in plain Python functions; pass them to framework as callable tools or nodes. :::

:::tip Prototype Fast, Refactor Before Launch Using LangChain for a week-long prototype is a legitimate strategy - the integration ecosystem is genuinely useful and the speed is real. The mistake is launching with the prototype code. Plan a refactoring sprint before production deployment: identify which framework abstractions you actually use, implement only those in your own code or in a thinner framework, and remove unused abstractions. :::

:::danger Beware the 50-Dependency Install pip install langchain-community currently pulls in 50+ dependencies including database drivers, cloud SDKs, and web frameworks - many of which you will never use. In production environments with security scanning, each dependency is an attack surface. Consider installing only the specific langchain integration packages you need rather than the community bundle. :::


Interview Questions

Q: How do you decide whether to use a framework or build on the raw API for an agent system?

A: I ask five questions in order: Does the problem require graph-based control flow with conditional routing and explicit state? If yes, LangGraph. Does it require multi-agent coordination? CrewAI or AutoGen. Does it require complex RAG with many data sources? LlamaIndex. Does it require many pre-built external integrations (Slack, Notion, databases)? LangChain. Otherwise - raw API. The default should be raw API, not a framework, because it gives full control, is easier to debug, has no version churn risk, and has zero additional dependencies. Frameworks earn their place only when they solve a problem you would otherwise have to build yourself.

Q: What does "abstraction leakage" mean in the context of LangChain?

A: Abstraction leakage is when a framework's internal implementation details bleed through its API surface, forcing you to understand the framework's internals to debug your application code. In LangChain, the most common leak point is the output parser: when the LLM produces output that is slightly off the expected format, LangChain throws an OutputParserException with an error message that requires understanding LangChain's internal parsing pipeline to fix. The same situation in raw API just works: you read the LLM response directly and handle it however your application logic requires. Every abstraction leaks eventually; the question is how frequently and how severely.

Q: How do you protect against framework version churn when building production agents?

A: The thin adapter pattern: define an abstract interface your application code depends on, and implement it with the framework in a separate module. Your business logic never imports from LangChain or LangGraph directly - it only imports from your interface. When the framework breaks its API, you fix the adapter, not your application. Additionally: pin framework versions in requirements.txt, run integration tests against the pinned version in CI, and allocate explicit time for framework version upgrades rather than treating them as incidental updates.

Q: When is LangGraph strictly better than LangChain's AgentExecutor?

A: LangGraph is better in four situations: (1) the agent needs to revisit earlier steps (cycles) - AgentExecutor is strictly sequential; (2) you need typed state that persists across steps - AgentExecutor has no first-class state schema; (3) you need checkpointing and resume - build a task that pauses waiting for human approval, then resumes; (4) you need human-in-the-loop interruption - interrupt_before=["node_name"] gives you a suspension point AgentExecutor cannot replicate. For a simple linear chain (system prompt → LLM → tool → LLM → output), AgentExecutor and raw API are both fine; LangGraph adds overhead without benefit.

Q: A team wants to use LangChain for their production agent because everyone else does. What would you tell them?

A: I would acknowledge that LangChain's adoption is genuine - the integration ecosystem and community are real advantages. Then I would ask: What specific LangChain features will you actually use? For most production agents, the answer is: the LLM abstraction and maybe 2–3 integrations. That is not enough to justify 50+ transitive dependencies and the API instability risk. I would propose building on raw API with a thin abstraction layer that makes it easy to add LangChain integrations selectively if needed. The social proof argument ("everyone uses LangChain") is not a technical reason - it is a risk avoidance signal that should be examined, not followed automatically.

© 2026 EngineersOfAI. All rights reserved.