I Built a Lightweight LLM Framework Because LangChain Frustrated Me - Here's What I Learned

April 15, 2026 · 15 min read

AI Engineering Education

There's a moment every LLM developer knows. You've got a working prototype. It's elegant, fast, and does exactly what you need. Then you try to deploy it. And suddenly you're debugging a chain inside a runnable inside a callback inside an abstraction that didn't exist six months ago.

That moment happened one too many times. So something else got built.

This is the story of SynapseKit - why it exists, what it does differently, and what 18 (and counting) objective benchmarks against LangChain and LlamaIndex actually revealed.

The Problem With "The Standard"

Every developer building LLM-powered applications today reaches for the same toolkit: LangChain or LlamaIndex. They're powerful, well-documented, and have massive communities. They're also, frankly, a pain to work with day-to-day.

Not bad. Just built for different goals.

LangChain's philosophy is maximum flexibility: there's an abstraction for everything, a chain for every use case, and 87 packages you can bolt on. It's impressive engineering. It's also a framework that treats simple tasks like they're distributed systems problems.

LlamaIndex's philosophy is data ingestion depth: best-in-class chunking, indexing, and retrieval. If your application lives and dies by retrieval precision, LlamaIndex is serious software. But you pay for that depth in complexity.

Both are solving real problems. But neither optimises for the thing that matters most when building production LLM systems:

How fast can I go from idea to working code, and how readable is that code six months later?

After the fifth time debugging a LangChain stack trace that pointed three abstraction layers away from the actual code, SynapseKit started getting written.

What Is SynapseKit?

SynapseKit is an async-first Python framework for building RAG pipelines, LLM agents, and multi-agent systems. It ships with:

31 LLM providers - OpenAI, Anthropic, Groq, Mistral, Gemini, Ollama, LMStudio, xAI, Novita, Writer, and 21 more
48 built-in tools - search, math, file I/O, HTTP, code execution, NLP, data analysis, and more
43 document loaders - PDF, EPUB, LaTeX, RTF, TSV, S3, Azure Blob, MongoDB, Dropbox, OneDrive, and more
MCP server support - SSE transport with Bearer auth for Model Context Protocol
Multi-agent primitives - ReActAgent, Crew/CrewAgent/Task, graph-based workflows, recursive subgraphs

pip install "synapsekit[semantic]"

The base install has 2 dependencies. The full semantic install - vector search, all loaders, all tools - pulls in 14 packages. LangChain installs 67. That's not a rounding error; it's a design philosophy.

synapsekit               →  2 deps  |  ~48 MB RAM  |  ~80ms startup
synapsekit[semantic]     → 14 deps  |
langchain                → 67 deps  | ~189 MB RAM  |  ~2.4s startup
llama-index-core         → 43 deps  | ~112 MB RAM  |  ~1.1s startup

The 30-Benchmark Series

Rather than writing a marketing post, a 30-notebook benchmark series was run on Kaggle comparing SynapseKit to LangChain 0.3 and LlamaIndex Core 0.12. One measurable dimension per notebook. Every notebook runs end-to-end on Kaggle free CPU. Results reported honestly - including when SynapseKit loses.

Follow the full series: kaggle.com/discussions/general/688339

Here's everything found so far.

Week 1: Developer Experience

#1 - Cold Start: SynapseKit wins by 30×

The first thing you notice when you import a framework is the wait. For Lambda functions, FastAPI startup, or any process that imports on every cold start, this compounds fast.

import time

t = time.perf_counter()
import synapsekit
print(f"SynapseKit: {time.perf_counter() - t:.3f}s")   # 0.082s

t = time.perf_counter()
import langchain
print(f"LangChain:  {time.perf_counter() - t:.3f}s")   # 2.41s

t = time.perf_counter()
import llama_index
print(f"LlamaIndex: {time.perf_counter() - t:.3f}s")   # 1.08s

SynapseKit: ~80ms. LangChain: ~2.4s. LlamaIndex: ~1.1s.

At 1,000 cold starts per day - realistic for a mid-traffic serverless API - LangChain burns 40 minutes of pure overhead. SynapseKit burns 1.3 minutes. In AWS Lambda terms, that's real money.

#2 - Dependency Count: SynapseKit wins by 33×

Framework	Base install	Full install
SynapseKit	2 packages	14 packages
LlamaIndex Core	43 packages	70+ packages
LangChain	67 packages	100+ packages

Fewer dependencies means faster installs, smaller container images, fewer CVE surface, and less pip freeze archaeology when something breaks.

#3 - Hello RAG: SynapseKit wins (fewest lines)

The same RAG pipeline - load documents, embed, retrieve, answer - across three frameworks:

# SynapseKit: 7 functional lines
from synapsekit import RAGPipeline, LLMConfig
from synapsekit.llm.openai import OpenAILLM

llm      = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key=KEY))
pipeline = RAGPipeline(llm=llm)
pipeline.add_documents(docs)
answer   = await pipeline.query("What is RAG?")

# LangChain: 14 functional lines
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub

llm         = ChatOpenAI(model="gpt-4o-mini")
embeddings  = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever   = vectorstore.as_retriever()
prompt      = hub.pull("rlm/rag-prompt")
chain       = ({"context": retriever, "question": RunnablePassthrough()}
               | prompt | llm | StrOutputParser())
answer      = chain.invoke("What is RAG?")

SynapseKit: 7 lines. LangChain: 14 lines. LlamaIndex: 11 lines.

This isn't code golf. Fewer lines means fewer places for bugs to hide, fewer things for a new team member to learn, and faster iteration. The LangChain version requires knowing what a runnable is, what hub.pull does, and why RunnablePassthrough is needed. The SynapseKit version is self-explanatory.

#4 - Memory Footprint: SynapseKit wins by 4×

Framework	RSS at import
SynapseKit	48 MB
LlamaIndex	112 MB
LangChain	189 MB

At 10 replicas, LangChain costs ~1.4 GB just in framework overhead. SynapseKit costs ~480 MB. For containerised deployments where you're paying per GB of memory, that difference compounds fast.

#5 - Provider Switching: SynapseKit wins (2 lines changed)

One of the most common tasks in LLM development is experimenting across providers. How many lines change when you swap from OpenAI to Groq to Ollama?

# SynapseKit - change 1 import + 1 config line
from synapsekit.llm.openai import OpenAILLM
llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key=OPENAI_KEY))

from synapsekit.llm.groq import GroqLLM
llm = GroqLLM(LLMConfig(model="llama-3-8b-8192", api_key=GROQ_KEY))

from synapsekit.llm.ollama import OllamaLLM
llm = OllamaLLM(LLMConfig(model="llama3"))
# Everything downstream: unchanged.

SynapseKit: 2 lines. LangChain: 4–6 lines. LlamaIndex: 3–4 lines.

31 providers, all following the same LLMConfig pattern. Switching from a paid API to a local model for development takes 10 seconds.

Week 2: RAG Pipelines

#8 - PDF Ingestion: All close

All three frameworks can index a PDF in under 10 lines. This one's effectively a draw - SynapseKit slightly more concise but the gap is small.

#9 - Chunking Strategies: LlamaIndex wins

This is where LlamaIndex genuinely excels.

LlamaIndex ships 9+ built-in splitters including SentenceWindowNodeParser (adds surrounding context sentences to each chunk) and HierarchicalNodeParser (creates parent-child chunk trees for better retrieval). These are sophisticated, research-backed strategies that meaningfully improve retrieval quality.

SynapseKit and LangChain both offer token-based and sentence-based splitting - adequate for most use cases, but not at LlamaIndex's depth.

If your application's quality depends on smart chunking, LlamaIndex is the right choice for the retrieval layer.

#10 - Built-in BM25: SynapseKit wins

BM25 is the backbone of lexical search and an essential half of any hybrid retrieval system. In SynapseKit, it's a core dependency - no extra install.

# SynapseKit - BM25 built in, zero extra pip
from synapsekit.retrievers import BM25Retriever

retriever = BM25Retriever(documents)
results   = retriever.retrieve("machine learning transformers", k=5)

LangChain requires pip install rank-bm25 and additional wiring. LlamaIndex similarly requires an extra install. For a technique this fundamental to production RAG, burying it behind an extra install is a friction tax.

#11 - Hybrid Search (RRF Fusion): LangChain wins

Reciprocal Rank Fusion blends BM25 lexical scores and semantic embedding scores into a single ranked list - typically outperforming either alone by 5–15% on BEIR benchmarks.

LangChain's EnsembleRetriever is the cleanest API for this. SynapseKit supports hybrid retrieval but requires more manual wiring at present. Honest finding: LangChain wins this one.

#12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)

All three frameworks achieve sub-millisecond TTFT in a mock environment. The real differences are at the API layer, not the framework layer. But the streaming API ergonomics differ:

# SynapseKit - stream tokens as they arrive
async for token in llm.stream("Explain transformers in simple terms"):
    print(token, end="", flush=True)

LangChain requires astream() on runnables. LlamaIndex requires a StreamingResponse wrapper. Small differences, but they accumulate across a codebase.

#13 - Conversation Memory: SynapseKit wins (clarity)

Framework	API	Trimming strategy
SynapseKit	`ConversationMemory(window=3)`	Turn-count sliding window
LangChain	`InMemoryChatMessageHistory`	Manual - stores everything, you trim
LlamaIndex	`ChatMemoryBuffer.from_defaults(token_limit=500)`	Token-budget trimming

SynapseKit's window= parameter is the most beginner-friendly. LlamaIndex's token-budget approach is the most robust for production - especially when dealing with long tool outputs that blow up turn-count estimates.

Week 3: Agents & Tools

#15 - ReAct Agents: SynapseKit wins (3 lines vs 11)

# SynapseKit: 3 lines to a working ReAct agent
from synapsekit import ReActAgent
from synapsekit.tools import CalculatorTool, DateTimeTool

agent  = ReActAgent(llm=llm, tools=[CalculatorTool(), DateTimeTool()], max_iterations=10)
result = await agent.run("What is 847 × 23, and what day is it today?")

SynapseKit: 3 lines. LangChain: 11 lines (requires create_react_agent + AgentExecutor + a prompt template from LangSmith hub). LlamaIndex: 9 lines.

#16 - Function Calling: SynapseKit wins (multi-provider schemas)

SynapseKit's BaseTool generates both OpenAI-format and Anthropic-format schemas from a single tool definition. Write a tool once, use it with any provider:

class WeatherTool(BaseTool):
    name        = "get_weather"
    description = "Get the current weather for a city."
    parameters  = {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
        },
        "required": ["city"],
    }

    async def run(self, city: str) -> str:
        return f"Sunny, 22°C in {city}"

tool = WeatherTool()
tool.schema()             # → OpenAI tools format
tool.anthropic_schema()   # → Anthropic tool_use format

One tool definition. Zero vendor lock-in. Switch your LLM provider and your tools come with you.

#17 - Built-in Tool Libraries: SynapseKit wins by a wide margin

Framework	Built-in tools	Zero-config (no API key needed)
SynapseKit	48 across 9 categories	12
LangChain	~17 core + community	Most need extra installs
LlamaIndex	3 core wrappers	3

SynapseKit's 9 tool categories - 48 tools ready to drop into any agent:

Category	Tools
Search	WebSearchTool, WikipediaTool, NewsSearchTool
Math	CalculatorTool, StatisticsCalculatorTool, UnitConverterTool
Date/Time	DateTimeTool, TimezoneConverterTool, CalendarTool
Text Processing	TextSummarizerTool, TextTranslatorTool, KeywordExtractorTool
File I/O	FileReaderTool, FileWriterTool, CSVReaderTool, JSONParserTool
HTTP	HTTPRequestTool, APIClientTool
Code Execution	PythonREPLTool, ShellCommandTool
Data Analysis	DataFrameAnalyzerTool, ChartGeneratorTool
NLP	SentimentAnalysisTool, NamedEntityRecognitionTool

With LangChain, getting a working tool usually means installing a community package, finding an API key, and reading a separate doc page. With SynapseKit, 12 tools work with zero configuration.

#18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)

from synapsekit import Crew, CrewAgent, Task

researcher = CrewAgent(
    name="researcher", role="Research Analyst",
    goal="Produce structured bullet points.", llm=llm
)
writer = CrewAgent(
    name="writer", role="Content Writer",
    goal="Turn bullet points into a polished paragraph.", llm=llm
)

tasks = [
    Task(description=f"Research: {TOPIC}", agent="researcher",
         expected_output="3–5 bullet points"),
    Task(description="Write a paragraph from the research.", agent="writer",
         context_from=["researcher"], expected_output="One paragraph"),
]

crew   = Crew(agents=[researcher, writer], tasks=tasks, process="sequential")
result = await crew.run()

The context_from= parameter is the key insight: tasks declare their data dependencies declaratively. The framework handles execution order and context passing.

Orchestration pattern support:

Pattern	SynapseKit	LangChain	LlamaIndex
Sequential	✅	✅	✅
Parallel	✅	✅	❌
Supervisor	✅	✅	❌
Handoff chain	✅	❌ (manual)	✅
Graph / DAG	✅	✅ (LangGraph)	❌
Shared state	✅	✅	✅
Score	6/6	5/6	3/6

LangChain's LangGraph is genuinely excellent for complex conditional workflows - if you need a state machine with branching logic, it's the right tool. SynapseKit's graph support handles the majority of production patterns with less ceremony.

Cumulative Scorecard (18 notebooks in)

Framework	Points	Category wins
SynapseKit	38	12 - cold start, dependencies, LoC, memory, provider switching, BM25, streaming ergonomics, memory clarity, ReAct agents, function calling, tools, multi-agent
LangChain	22	3 - hybrid search RRF, LangGraph flexibility, error UX
LlamaIndex	18	2 - chunking depth, token-budget memory

SynapseKit leads on developer ergonomics and batteries-included tooling. LangChain leads on complex graph orchestration. LlamaIndex leads on retrieval precision.

Architecture: What Makes SynapseKit Different

1. Async by default - not retrofitted

SynapseKit was designed async from the ground up. Every run(), every query(), every tool call returns a coroutine.

# Concurrent queries - not sequential
results = await asyncio.gather(
    pipeline.query("What is the capital of France?"),
    pipeline.query("Explain backpropagation in 2 sentences."),
    pipeline.query("Summarise the attached PDF."),
)

In LangChain, async is available but not the default. Many features exist only in sync form and async was added later. The difference is subtle in a tutorial, significant in a production API.

2. Shallow call stack - your errors, not ours

When pipeline.query() breaks in LangChain, your traceback travels through Runnable, RunnableSequence, CallbackManager, BaseChain, and surfaces somewhere deep in the framework. You spend 10 minutes decoding the stack trace before you can begin debugging.

In SynapseKit, the call path is intentionally shallow. When something breaks, the traceback points at your code. No hidden middleware, no callback chains, no runnable wrappers unless you explicitly add them.

3. Unified tool interface - one definition, every provider

class BaseTool:
    name: str
    description: str
    parameters: dict  # JSON Schema

    async def run(self, **kwargs) -> str: ...
    def schema(self) -> dict: ...           # OpenAI tools format
    def anthropic_schema(self) -> dict: ... # Anthropic tool_use format

Write a tool once. It works with GPT-4o, Claude 3.5, Llama 3 on Groq, Gemini - any of the 31 supported providers. No adapter layer, no per-provider tool registration.

4. Task-centric multi-agent - separate what from who

SynapseKit's Crew model separates what to do (Task) from who does it (Agent). Tasks declare their dependencies via context_from. The framework handles execution order, context accumulation, and result passing.

Wiring data flow manually between agents is the source of most multi-agent bugs. When Agent B needs Agent A's output, you shouldn't write the plumbing; you should declare the dependency.

5. 43 loaders - data ingestion without hunting for packages

Production RAG applications ingest data from everywhere. SynapseKit ships 43 loaders:

Documents: PDF, EPUB, LaTeX, RTF, DOCX, Markdown, HTML
Data: CSV, TSV, JSON, XML, SQLite
Cloud: S3, Azure Blob, OneDrive, Dropbox
Databases: MongoDB, PostgreSQL
Config: .env, YAML, TOML
Web: sitemap crawlers, URL loaders, RSS feeds
Code: Python, JavaScript, TypeScript source files

One consistent Loader.load() → List[Document] interface. Every loader returns the same type. Your downstream pipeline code never changes regardless of where the data comes from.

6. MCP Server support - Model Context Protocol built in

from synapsekit.mcp import MCPServer

server = MCPServer(name="my-tools", tools=[WeatherTool(), CalculatorTool()])
await server.run_sse(host="0.0.0.0", port=8080, bearer_token="secret")

Expose any tool as a production MCP endpoint in 3 lines. Compatible with any MCP-compliant client.

The Honest Take: When to Use Each

SynapseKit was built for a specific set of problems. It's not the right choice for every use case.

Use SynapseKit when:

You're building a greenfield LLM app and want the fastest path to production
Your app is async-first - APIs, webhooks, real-time applications, serverless
You need a small footprint - containers, Lambda, edge runtimes
You want batteries included without hunting for extra packages
Your pipeline uses standard patterns: ReAct agents, Crew orchestration, RAG, streaming
You're experimenting across providers and need painless switching
You want readable code that a new team member can understand without framework training

Use LangChain when:

You need complex conditional graph workflows - LangGraph is genuinely excellent at stateful, branching agentic pipelines
You need a specific integration from LangChain's 150+ partner ecosystem
Your team already knows LangChain deeply and migration cost outweighs gains
You need LangSmith observability deeply integrated into your debugging workflow

Use LlamaIndex when:

Advanced chunking is central to your application quality (SentenceWindow, Hierarchical - there's nothing equivalent in SynapseKit today)
You're building a knowledge-intensive system where retrieval precision is the primary metric
You want LLM-native evaluation metrics (faithfulness, relevance, groundedness) built into the framework

What's Coming in the Benchmark Series

The series continues through Notebooks #19–#30:

#19 - Observability & Tracing: What can you actually see when your agent runs?
#20 - Agent Error Handling: What happens when a tool throws an exception mid-loop?
#21 - Week 3 Scorecard: Agents & tools final rankings
#22 - Async Throughput: Requests/second under real concurrency
#23 - Graph Workflows: DAG pipelines for complex conditional flows
#24 - LLM Evaluation: Built-in faithfulness and relevance metrics
#25 - Cost Tracking: Token counting and spend visibility
#26 - Guardrails: Content filtering and output validation
#27 - MCP Support: Model Context Protocol in practice
#28 - Week 4 Scorecard
#29–#30 - Final Verdict: Which framework wins, for whom, and why

Follow the series on Kaggle

Quick Start

# Minimal install - 2 dependencies
pip install synapsekit

# Full install - vector search, all loaders, all tools
pip install "synapsekit[semantic]"

# Your first RAG pipeline in 7 lines
from synapsekit import RAGPipeline, LLMConfig
from synapsekit.llm.openai import OpenAILLM
from synapsekit.loaders import PDFLoader

llm      = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key="sk-..."))
docs     = PDFLoader("research.pdf").load()
pipeline = RAGPipeline(llm=llm)
pipeline.add_documents(docs)

answer = await pipeline.query("What are the main findings?")
print(answer)

# Your first multi-agent crew in 10 lines
from synapsekit import Crew, CrewAgent, Task
from synapsekit.llm.groq import GroqLLM

llm        = GroqLLM(LLMConfig(model="llama-3-8b-8192", api_key="gsk-..."))
researcher = CrewAgent(name="researcher", role="Research Analyst", llm=llm)
writer     = CrewAgent(name="writer", role="Writer", llm=llm)
tasks      = [
    Task(description="Research quantum computing trends", agent="researcher"),
    Task(description="Write a blog intro", agent="writer", context_from=["researcher"]),
]
result = await Crew(agents=[researcher, writer], tasks=tasks).run()

Links:

GitHub: github.com/SynapseKit/SynapseKit
Docs: synapsekit.github.io/synapsekit-docs
Kaggle benchmark series: kaggle.com/discussions/general/688339

Every benchmark is reproducible. Fork any notebook and run it on Kaggle free CPU. If the results differ in your environment, open an issue.

Engineers of AI

Read more: www.engineersofai.com

The Problem With "The Standard"​

What Is SynapseKit?​

The 30-Benchmark Series​

Week 1: Developer Experience​

#1 - Cold Start: SynapseKit wins by 30×​

#2 - Dependency Count: SynapseKit wins by 33×​

#3 - Hello RAG: SynapseKit wins (fewest lines)​

#4 - Memory Footprint: SynapseKit wins by 4×​

#5 - Provider Switching: SynapseKit wins (2 lines changed)​

Week 2: RAG Pipelines​

#8 - PDF Ingestion: All close​

#9 - Chunking Strategies: LlamaIndex wins​

#10 - Built-in BM25: SynapseKit wins​

#11 - Hybrid Search (RRF Fusion): LangChain wins​

#12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)​

#13 - Conversation Memory: SynapseKit wins (clarity)​

Week 3: Agents & Tools​

#15 - ReAct Agents: SynapseKit wins (3 lines vs 11)​

#16 - Function Calling: SynapseKit wins (multi-provider schemas)​

#17 - Built-in Tool Libraries: SynapseKit wins by a wide margin​

#18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)​

Cumulative Scorecard (18 notebooks in)​

Architecture: What Makes SynapseKit Different​

1. Async by default - not retrofitted​

2. Shallow call stack - your errors, not ours​

3. Unified tool interface - one definition, every provider​

4. Task-centric multi-agent - separate what from who​

5. 43 loaders - data ingestion without hunting for packages​

6. MCP Server support - Model Context Protocol built in​

The Honest Take: When to Use Each​

What's Coming in the Benchmark Series​

Quick Start​

Want to Think Like an AI Architect?