Skip to main content

LangChain Deep Dive

A Production Scenario

It is Q4 and your team has been asked to build a customer-facing document QA system. Users upload contracts, upload research papers, upload financial reports, and ask questions. The timeline is six weeks. Your senior engineer proposes using direct API calls with a custom retrieval layer. Your junior engineer pulls up the LangChain documentation and within 20 minutes has a prototype working.

The junior engineer's prototype uses RetrievalQA, Chroma, OpenAIEmbeddings, and ChatOpenAI - four imports, about 30 lines of code, and it handles chunking, embedding, indexing, retrieval, and question answering. It is impressive. You ship a demo with it.

Then you hit production. Users upload 500-page PDFs. The chunking strategy produces terrible chunks that split mid-sentence. Retrieval is returning wrong documents. The prompts are hidden inside LangChain's source code and you cannot see what the model is actually receiving. Debugging requires setting verbose=True and reading through walls of formatted console output. A model API changes its response format and LangChain's internal parsing breaks silently.

Your senior engineer rebuilds the retrieval pipeline in direct API calls in two days. It is 150 lines instead of 30, but every line is visible, every prompt is under your control, and debugging is straightforward.

This is the LangChain experience: exceptional for prototyping and standard patterns, frustrating when you need visibility, control, or something the framework did not anticipate. This lesson gives you a complete mental model of what LangChain actually does, when to use it, and when to reach past it.


Why This Exists

The Pre-LangChain World

Before LangChain (2022), building LLM applications meant writing a lot of boilerplate: managing conversation history manually, writing custom retrieval code, building prompt templates by hand, implementing retry logic, handling API rate limits. Every team built the same components from scratch.

LangChain (Harrison Chase, late 2022) proposed a set of standard abstractions for these common patterns. The idea: compose LLM applications from reusable components (models, prompts, retrievers, chains) rather than writing everything from scratch.

This was the right abstraction at the right moment. LangChain exploded in popularity and became the default starting point for LLM application development in 2023. Its GitHub grew to 80,000+ stars in under a year.

What Problem It Actually Solves

LangChain solves the integration problem: connecting LLMs to the ecosystem. Out of the box, it provides:

  • Model integrations: 50+ LLM providers behind a common interface
  • Retrieval integrations: 50+ vector databases behind a common retriever interface
  • Document loaders: PDFs, web pages, databases, Notion, GitHub, etc.
  • Text splitters: intelligent chunking strategies
  • Prompt templates: reusable, parameterized prompt construction
  • Output parsers: structured output extraction from LLM responses
  • Agent executors: the ReAct loop with configurable tools
  • Memory: conversation history management

The value is the integrations and the standard interface. You can swap OpenAI for Anthropic with one line change. You can swap Chroma for Pinecone with one line change.


Core Abstractions

The Runnable Interface

Everything in LangChain is a Runnable. The interface is:

class Runnable:
def invoke(self, input) -> output # Single sync call
def stream(self, input) -> Iterator # Streaming output
def batch(self, inputs) -> list[output] # Multiple inputs in parallel
async def ainvoke(self, input) -> output # Async single call
async def astream(self, input) -> AsyncIterator
async def abatch(self, inputs) -> list

Every LangChain component - LLMs, prompts, retrievers, parsers, chains - implements this interface. This is what makes LCEL (LangChain Expression Language) work: any two Runnables can be composed with |.

Models

from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

# Chat models
claude = ChatAnthropic(model="claude-opus-4-6", temperature=0)
gpt4 = ChatOpenAI(model="gpt-4o", temperature=0)

# Basic invocation
response = claude.invoke([
SystemMessage(content="You are a helpful assistant."),
HumanMessage(content="What is RLHF?")
])
print(response.content) # AIMessage

# Streaming
for chunk in claude.stream([HumanMessage(content="Count to 5 slowly")]):
print(chunk.content, end="", flush=True)

# Batch (parallel)
responses = claude.batch([
[HumanMessage(content="What is RAG?")],
[HumanMessage(content="What is a transformer?")],
])

Prompt Templates

from langchain_core.prompts import (
ChatPromptTemplate,
PromptTemplate,
MessagesPlaceholder
)

# Simple prompt template
summarize_prompt = PromptTemplate.from_template(
"Summarize the following text in {num_sentences} sentences:\n\n{text}"
)

# Fill in variables
filled = summarize_prompt.format(
num_sentences=3,
text="Long document goes here..."
)

# Chat prompt template with history placeholder
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant specializing in {domain}."),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{question}")
])

# Invoke with variables
messages = chat_prompt.format_messages(
domain="machine learning",
chat_history=[
HumanMessage(content="What is gradient descent?"),
AIMessage(content="Gradient descent is an optimization algorithm...")
],
question="How does learning rate affect it?"
)

Output Parsers

from langchain_core.output_parsers import (
StrOutputParser,
JsonOutputParser,
PydanticOutputParser
)
from pydantic import BaseModel, Field

# Simple string extraction
str_parser = StrOutputParser()

# JSON parsing
json_parser = JsonOutputParser()

# Pydantic parsing with validation
class MovieReview(BaseModel):
title: str = Field(description="Movie title")
score: float = Field(description="Score from 1-10", ge=1, le=10)
pros: list[str] = Field(description="Positive aspects")
cons: list[str] = Field(description="Negative aspects")

pydantic_parser = PydanticOutputParser(pydantic_object=MovieReview)

# The parser can inject format instructions into your prompt
format_instructions = pydantic_parser.get_format_instructions()
print(format_instructions) # JSON schema instructions the model should follow

LangChain Expression Language (LCEL)

LCEL is the composable pipeline syntax. The | operator connects Runnables into a chain.

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatAnthropic(model="claude-opus-4-6")

# Build a simple chain
chain = (
ChatPromptTemplate.from_messages([
("system", "You are an expert technical writer."),
("human", "Write a one-paragraph explanation of {concept} for {audience}.")
])
| llm
| StrOutputParser()
)

# Invoke
result = chain.invoke({
"concept": "transformer attention",
"audience": "junior engineers"
})
print(result)

# Stream
for token in chain.stream({"concept": "RLHF", "audience": "product managers"}):
print(token, end="", flush=True)

# Batch (parallel)
results = chain.batch([
{"concept": "RAG", "audience": "executives"},
{"concept": "vector databases", "audience": "data scientists"},
{"concept": "fine-tuning", "audience": "ML engineers"},
])

Branching with RunnableBranch

from langchain_core.runnables import RunnableBranch, RunnableLambda

def classify_question(input_dict: dict) -> str:
"""Classify the type of question."""
question = input_dict["question"].lower()
if any(word in question for word in ["code", "function", "debug", "error"]):
return "technical"
elif any(word in question for word in ["price", "cost", "buy"]):
return "commercial"
return "general"


technical_chain = (
ChatPromptTemplate.from_messages([
("system", "You are a senior software engineer. Answer technical questions precisely."),
("human", "{question}")
])
| llm | StrOutputParser()
)

commercial_chain = (
ChatPromptTemplate.from_messages([
("system", "You are a sales engineer. Be helpful and accurate about pricing."),
("human", "{question}")
])
| llm | StrOutputParser()
)

general_chain = (
ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{question}")
])
| llm | StrOutputParser()
)

routed_chain = RunnableBranch(
(lambda x: classify_question(x) == "technical", technical_chain),
(lambda x: classify_question(x) == "commercial", commercial_chain),
general_chain # Default
)

answer = routed_chain.invoke({"question": "How do I debug a memory leak in Python?"})

Building a Production RAG Pipeline with LCEL

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
import os

# ── Document Processing ───────────────────────────────────────────────────────

def load_and_index_documents(file_paths: list[str]) -> Chroma:
"""Load documents, chunk them, embed, and index."""
all_docs = []
for path in file_paths:
if path.endswith(".pdf"):
loader = PyPDFLoader(path)
else:
from langchain_community.document_loaders import TextLoader
loader = TextLoader(path)
all_docs.extend(loader.load())

# Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(all_docs)
print(f"Created {len(chunks)} chunks from {len(all_docs)} documents")

# Index in vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
return vectorstore


# ── RAG Chain ─────────────────────────────────────────────────────────────────

def build_rag_chain(vectorstore: Chroma):
"""Build a production-grade RAG chain with LCEL."""

retriever = vectorstore.as_retriever(
search_type="mmr", # Maximal marginal relevance: diverse results
search_kwargs={
"k": 5, # Return 5 chunks
"fetch_k": 20, # Fetch 20, then pick 5 diverse ones
}
)

rag_prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant that answers questions based on provided documents.

Rules:
- Only answer based on the provided context
- If the context does not contain enough information, say so clearly
- Cite the source document when making claims
- Be precise and accurate"""),
("human", """Context:
{context}

Question: {question}

Answer based on the context above:""")
])

def format_docs(docs) -> str:
"""Format retrieved documents into a context string."""
formatted = []
for i, doc in enumerate(docs, 1):
source = doc.metadata.get("source", "Unknown")
page = doc.metadata.get("page", "")
source_info = f"{source}, page {page}" if page else source
formatted.append(f"[Document {i} - {source_info}]\n{doc.page_content}")
return "\n\n".join(formatted)

# Build the chain
rag_chain = (
RunnableParallel({
"context": retriever | format_docs,
"question": RunnablePassthrough()
})
| rag_prompt
| llm
| StrOutputParser()
)

return rag_chain


# ── Usage ─────────────────────────────────────────────────────────────────────

# vectorstore = load_and_index_documents(["contract.pdf", "terms.pdf"])
# rag_chain = build_rag_chain(vectorstore)
# answer = rag_chain.invoke("What are the payment terms?")
# print(answer)

LangGraph: Stateful Cyclical Workflows

LangGraph is LangChain's answer to the question: "What about workflows that loop, have conditional branches, or maintain complex state?" LangChain's linear chains cannot express these. LangGraph can.

LangGraph represents workflows as directed graphs. Nodes are Python functions. Edges define transitions. State is a typed dict that persists across node executions.

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator


class ChatState(TypedDict):
messages: Annotated[list, operator.add] # Messages accumulate
context: str # Retrieved context
requires_retrieval: bool # Routing decision
final_answer: str


# ── Define Nodes ──────────────────────────────────────────────────────────────

def intent_classifier(state: ChatState) -> dict:
"""Classify whether the question needs retrieval."""
last_message = state["messages"][-1]
response = llm.invoke([
SystemMessage(content=(
"Classify if this question requires document retrieval. "
"Output exactly: RETRIEVAL_NEEDED or NO_RETRIEVAL"
)),
HumanMessage(content=last_message.content)
])
needs_retrieval = "RETRIEVAL_NEEDED" in response.content
return {"requires_retrieval": needs_retrieval}


def retrieve_context(state: ChatState) -> dict:
"""Retrieve relevant context for the question."""
# In production, use your vectorstore here
last_message = state["messages"][-1]
context = f"[Retrieved context for: {last_message.content[:50]}...]"
return {"context": context}


def generate_with_context(state: ChatState) -> dict:
"""Generate answer using retrieved context."""
response = llm.invoke([
SystemMessage(content=f"Use this context to answer:\n{state['context']}"),
*state["messages"]
])
return {
"messages": [response],
"final_answer": response.content
}


def generate_direct(state: ChatState) -> dict:
"""Generate answer directly from conversation."""
response = llm.invoke(state["messages"])
return {
"messages": [response],
"final_answer": response.content
}


# ── Routing ───────────────────────────────────────────────────────────────────

def route_by_intent(state: ChatState) -> str:
return "retrieve" if state["requires_retrieval"] else "direct"


# ── Build the Graph ───────────────────────────────────────────────────────────

workflow = StateGraph(ChatState)
workflow.add_node("classify", intent_classifier)
workflow.add_node("retrieve", retrieve_context)
workflow.add_node("answer_with_context", generate_with_context)
workflow.add_node("answer_direct", generate_direct)

workflow.set_entry_point("classify")
workflow.add_conditional_edges(
"classify",
route_by_intent,
{"retrieve": "retrieve", "direct": "answer_direct"}
)
workflow.add_edge("retrieve", "answer_with_context")
workflow.add_edge("answer_with_context", END)
workflow.add_edge("answer_direct", END)

# Compile with checkpointing for session persistence
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)

# Run with session state
config = {"configurable": {"thread_id": "user-123"}}
result = app.invoke(
{"messages": [HumanMessage(content="What are the refund terms?")],
"context": "", "requires_retrieval": False, "final_answer": ""},
config=config
)
print(result["final_answer"])

LangSmith: Observability

LangSmith is LangChain's tracing and evaluation platform. It captures every LLM call, retrieval, chain step, and tool use in a visual trace.

import os

# Enable LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-agent-project"

# With tracing enabled, every chain invocation is automatically logged
# You can view traces at smith.langchain.com

# Manual run logging
from langsmith import traceable

@traceable(name="custom-rag-call", tags=["production", "rag"])
def my_rag_function(question: str) -> str:
"""This function's inputs/outputs will be traced."""
# Your logic here
return f"Answer to: {question}"

# Run evaluation from LangSmith datasets
from langsmith import Client

smith_client = Client()

# Create a dataset
dataset = smith_client.create_dataset("rag-eval-v1")
smith_client.create_examples(
inputs=[
{"question": "What are the payment terms?"},
{"question": "How do I cancel my subscription?"},
],
outputs=[
{"answer": "Payment is due within 30 days."},
{"answer": "Cancel via account settings."},
],
dataset_id=dataset.id
)

Callbacks: Monitoring Every Step

from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.outputs import LLMResult
import time


class ProductionCallbackHandler(BaseCallbackHandler):
"""Log and monitor every LangChain operation."""

def __init__(self):
self._start_times = {}
self.total_tokens = 0

def on_llm_start(self, serialized, prompts, **kwargs):
run_id = kwargs.get("run_id")
self._start_times[str(run_id)] = time.time()
print(f"[LLM START] Prompt length: {sum(len(p) for p in prompts)} chars")

def on_llm_end(self, response: LLMResult, **kwargs):
run_id = kwargs.get("run_id")
elapsed = time.time() - self._start_times.get(str(run_id), time.time())

usage = response.llm_output.get("usage", {}) if response.llm_output else {}
tokens = usage.get("total_tokens", 0)
self.total_tokens += tokens

print(f"[LLM END] Tokens: {tokens}, Latency: {elapsed:.2f}s")

def on_retriever_end(self, documents, **kwargs):
print(f"[RETRIEVER] Retrieved {len(documents)} documents")

def on_chain_error(self, error, **kwargs):
print(f"[ERROR] Chain error: {error}")


# Use the callback
callback = ProductionCallbackHandler()
result = chain.invoke(
{"concept": "attention", "audience": "engineers"},
config={"callbacks": [callback]}
)
print(f"Total tokens used: {callback.total_tokens}")

AgentExecutor: Running Tool-Using Agents

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate


@tool
def search_knowledge_base(query: str) -> str:
"""Search the internal knowledge base for information."""
return f"Knowledge base results for '{query}': [relevant information]"


@tool
def calculate_roi(revenue: float, cost: float) -> str:
"""Calculate return on investment given revenue and cost."""
if cost == 0:
return "Error: cost cannot be zero"
roi = ((revenue - cost) / cost) * 100
return f"ROI: {roi:.1f}% (Revenue: ${revenue:,.2f}, Cost: ${cost:,.2f})"


@tool
def format_report(content: str, format_type: str = "markdown") -> str:
"""Format content as a structured report."""
if format_type == "markdown":
return f"# Report\n\n{content}"
return content


tools = [search_knowledge_base, calculate_roi, format_report]

agent_prompt = ChatPromptTemplate.from_messages([
("system", (
"You are a helpful business analyst. Use available tools to "
"answer questions accurately. Think step by step."
)),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, agent_prompt)

executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=10,
max_execution_time=60.0,
return_intermediate_steps=True,
handle_parsing_errors="Check your output format and try again",
)

result = executor.invoke({
"input": "Search for our Q4 revenue data and calculate the ROI if costs were $1.2M and revenue was $3.4M"
})

print(f"Answer: {result['output']}")
print(f"Steps: {len(result['intermediate_steps'])}")

When NOT to Use LangChain

LangChain's abstractions add value for certain patterns. They add complexity and opacity for others.

Use LangChain When:

  • You need to swap LLM providers or vector databases regularly
  • You are prototyping and want to go fast
  • Your use case matches a built-in chain (RetrievalQA, ConversationalRetrievalChain)
  • You want LangSmith tracing for free
  • You are building a standard RAG or agent pattern

Skip LangChain When:

  • You need full visibility into what prompts the model receives - LangChain's prompt templates can be hard to inspect in production
  • Your use case requires non-standard behavior that fights the framework's assumptions
  • You have strict latency requirements - LCEL adds overhead
  • You are building something simple - a 5-line direct API call is better than 50 lines of LangChain boilerplate for a one-shot prompt
  • Your team finds the abstractions confusing - LangChain's learning curve is steep
# The LangChain way (for a simple Q&A):
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = (
ChatPromptTemplate.from_messages([
("system", "You are helpful."),
("human", "{question}")
])
| ChatAnthropic(model="claude-opus-4-6")
| StrOutputParser()
)
result = chain.invoke({"question": "What is RLHF?"})

# The direct way (equivalent, often clearer):
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system="You are helpful.",
messages=[{"role": "user", "content": "What is RLHF?"}]
)
result = response.content[0].text

For simple cases, the direct API call is more readable. Reach for LangChain when the integrations and composability genuinely save you work.


Common Mistakes

:::danger Hidden Prompts in Built-in Chains RetrievalQA and other built-in chains use internal prompt templates that you cannot see without reading the source code. These prompts may not match your use case. Always inspect what prompts your chain is sending - use chain.get_prompts() or enable verbose mode. :::

:::danger Not Pinning LangChain Versions LangChain has broken backward compatibility multiple times (the migration from langchain to langchain-core, langchain-community etc.). Always pin exact versions in requirements.txt. An automatic dependency update has a real chance of breaking your application. :::

:::warning Over-Chaining It is tempting to chain everything with |. Deeply nested chains are harder to debug than sequential function calls. If your chain is more than 5-6 steps, consider whether the abstraction is adding value or just making the code harder to read. :::

:::warning Ignoring the Runnable Interface for Custom Components If you write custom logic that does not implement the Runnable interface, it cannot participate in LCEL composition. Either wrap it with RunnableLambda or implement invoke/ainvoke. :::


Interview Q&A

Q: What is LCEL and what problem does it solve?

LangChain Expression Language is a composable pipeline syntax using the | operator to chain Runnables. It solves the wiring problem: connecting LLMs, prompts, retrievers, and parsers into pipelines without writing explicit for loops and function calls for every step. LCEL also gives you streaming, batching, and async for free - any LCEL chain automatically supports all Runnable interface methods.

Q: How does LangGraph differ from LangChain chains?

LangChain chains are acyclic pipelines - data flows in one direction. LangGraph supports cycles (loops back to previous nodes), conditional branching based on state, and persistent state across node executions. This makes LangGraph suitable for agent workflows that iterate, self-correct, or require complex routing logic. LangGraph also supports checkpointing - saving graph state to resume a workflow from a specific point.

Q: When would you use return_intermediate_steps=True in AgentExecutor?

For debugging and evaluation. When set to True, the executor returns the full trajectory of tool calls and their results alongside the final answer. This lets you: inspect which tools were called and in what order, verify that the agent correctly used tool results, build golden trajectories for regression testing, and compute efficiency metrics (how many tool calls were needed). In production, set it to False to save the overhead of capturing intermediate state.

Q: What is the difference between ConversationBufferMemory and ConversationSummaryMemory?

ConversationBufferMemory keeps all messages verbatim. It is simple and lossless but grows without bound. ConversationSummaryMemory periodically summarizes old messages into a compact text representation. It keeps the context window manageable but is lossy - specific details from older messages may be dropped. ConversationSummaryBufferMemory combines both: keep recent messages verbatim, summarize older ones. This hybrid is the most practical for production.

Q: How do you debug a LangChain application that produces wrong answers?

Start with verbose=True on your chain or agent to see every step printed to console. Set LANGCHAIN_TRACING_V2=true for LangSmith to get visual traces. For retrieval issues, directly inspect retrieved documents before they go into the prompt: retriever.invoke("your query"). For prompt issues, call chain.get_prompts() or format the prompt manually: prompt.format_messages(**inputs). The most common issues: retrieval is returning wrong chunks (chunking strategy problem), the prompt template is not interpolating variables correctly, or the output parser is failing on edge case responses.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the LangChain vs LlamaIndex demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.