Skip to main content

I Built a Lightweight LLM Framework Because LangChain Frustrated Me - Here's What I Learned

· 15 min read
EngineersOfAI
AI Engineering Education

There's a moment every LLM developer knows. You've got a working prototype. It's elegant, fast, and does exactly what you need. Then you try to deploy it. And suddenly you're debugging a chain inside a runnable inside a callback inside an abstraction that didn't exist six months ago.

That moment happened one too many times. So something else got built.

This is the story of SynapseKit - why it exists, what it does differently, and what 18 (and counting) objective benchmarks against LangChain and LlamaIndex actually revealed.

The Problem With "The Standard"

Every developer building LLM-powered applications today reaches for the same toolkit: LangChain or LlamaIndex. They're powerful, well-documented, and have massive communities. They're also, frankly, a pain to work with day-to-day.

Not bad. Just built for different goals.

LangChain's philosophy is maximum flexibility: there's an abstraction for everything, a chain for every use case, and 87 packages you can bolt on. It's impressive engineering. It's also a framework that treats simple tasks like they're distributed systems problems.

LlamaIndex's philosophy is data ingestion depth: best-in-class chunking, indexing, and retrieval. If your application lives and dies by retrieval precision, LlamaIndex is serious software. But you pay for that depth in complexity.

Both are solving real problems. But neither optimises for the thing that matters most when building production LLM systems:

How fast can I go from idea to working code, and how readable is that code six months later?

After the fifth time debugging a LangChain stack trace that pointed three abstraction layers away from the actual code, SynapseKit started getting written.

What Is SynapseKit?

SynapseKit is an async-first Python framework for building RAG pipelines, LLM agents, and multi-agent systems. It ships with:

  • 31 LLM providers - OpenAI, Anthropic, Groq, Mistral, Gemini, Ollama, LMStudio, xAI, Novita, Writer, and 21 more
  • 48 built-in tools - search, math, file I/O, HTTP, code execution, NLP, data analysis, and more
  • 43 document loaders - PDF, EPUB, LaTeX, RTF, TSV, S3, Azure Blob, MongoDB, Dropbox, OneDrive, and more
  • MCP server support - SSE transport with Bearer auth for Model Context Protocol
  • Multi-agent primitives - ReActAgent, Crew/CrewAgent/Task, graph-based workflows, recursive subgraphs
pip install "synapsekit[semantic]"

The base install has 2 dependencies. The full semantic install - vector search, all loaders, all tools - pulls in 14 packages. LangChain installs 67. That's not a rounding error; it's a design philosophy.

synapsekit → 2 deps | ~48 MB RAM | ~80ms startup
synapsekit[semantic] → 14 deps |
langchain → 67 deps | ~189 MB RAM | ~2.4s startup
llama-index-core → 43 deps | ~112 MB RAM | ~1.1s startup

The 30-Benchmark Series

Rather than writing a marketing post, a 30-notebook benchmark series was run on Kaggle comparing SynapseKit to LangChain 0.3 and LlamaIndex Core 0.12. One measurable dimension per notebook. Every notebook runs end-to-end on Kaggle free CPU. Results reported honestly - including when SynapseKit loses.

Follow the full series: kaggle.com/discussions/general/688339

Here's everything found so far.


Week 1: Developer Experience

#1 - Cold Start: SynapseKit wins by 30×

The first thing you notice when you import a framework is the wait. For Lambda functions, FastAPI startup, or any process that imports on every cold start, this compounds fast.

import time

t = time.perf_counter()
import synapsekit
print(f"SynapseKit: {time.perf_counter() - t:.3f}s") # 0.082s

t = time.perf_counter()
import langchain
print(f"LangChain: {time.perf_counter() - t:.3f}s") # 2.41s

t = time.perf_counter()
import llama_index
print(f"LlamaIndex: {time.perf_counter() - t:.3f}s") # 1.08s

SynapseKit: ~80ms. LangChain: ~2.4s. LlamaIndex: ~1.1s.

At 1,000 cold starts per day - realistic for a mid-traffic serverless API - LangChain burns 40 minutes of pure overhead. SynapseKit burns 1.3 minutes. In AWS Lambda terms, that's real money.

#2 - Dependency Count: SynapseKit wins by 33×

FrameworkBase installFull install
SynapseKit2 packages14 packages
LlamaIndex Core43 packages70+ packages
LangChain67 packages100+ packages

Fewer dependencies means faster installs, smaller container images, fewer CVE surface, and less pip freeze archaeology when something breaks.

#3 - Hello RAG: SynapseKit wins (fewest lines)

The same RAG pipeline - load documents, embed, retrieve, answer - across three frameworks:

# SynapseKit: 7 functional lines
from synapsekit import RAGPipeline, LLMConfig
from synapsekit.llm.openai import OpenAILLM

llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key=KEY))
pipeline = RAGPipeline(llm=llm)
pipeline.add_documents(docs)
answer = await pipeline.query("What is RAG?")
# LangChain: 14 functional lines
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub

llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
chain = ({"context": retriever, "question": RunnablePassthrough()}
| prompt | llm | StrOutputParser())
answer = chain.invoke("What is RAG?")

SynapseKit: 7 lines. LangChain: 14 lines. LlamaIndex: 11 lines.

This isn't code golf. Fewer lines means fewer places for bugs to hide, fewer things for a new team member to learn, and faster iteration. The LangChain version requires knowing what a runnable is, what hub.pull does, and why RunnablePassthrough is needed. The SynapseKit version is self-explanatory.

#4 - Memory Footprint: SynapseKit wins by 4×

FrameworkRSS at import
SynapseKit48 MB
LlamaIndex112 MB
LangChain189 MB

At 10 replicas, LangChain costs ~1.4 GB just in framework overhead. SynapseKit costs ~480 MB. For containerised deployments where you're paying per GB of memory, that difference compounds fast.

#5 - Provider Switching: SynapseKit wins (2 lines changed)

One of the most common tasks in LLM development is experimenting across providers. How many lines change when you swap from OpenAI to Groq to Ollama?

# SynapseKit - change 1 import + 1 config line
from synapsekit.llm.openai import OpenAILLM
llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key=OPENAI_KEY))

from synapsekit.llm.groq import GroqLLM
llm = GroqLLM(LLMConfig(model="llama-3-8b-8192", api_key=GROQ_KEY))

from synapsekit.llm.ollama import OllamaLLM
llm = OllamaLLM(LLMConfig(model="llama3"))
# Everything downstream: unchanged.

SynapseKit: 2 lines. LangChain: 4–6 lines. LlamaIndex: 3–4 lines.

31 providers, all following the same LLMConfig pattern. Switching from a paid API to a local model for development takes 10 seconds.


Week 2: RAG Pipelines

#8 - PDF Ingestion: All close

All three frameworks can index a PDF in under 10 lines. This one's effectively a draw - SynapseKit slightly more concise but the gap is small.

#9 - Chunking Strategies: LlamaIndex wins

This is where LlamaIndex genuinely excels.

LlamaIndex ships 9+ built-in splitters including SentenceWindowNodeParser (adds surrounding context sentences to each chunk) and HierarchicalNodeParser (creates parent-child chunk trees for better retrieval). These are sophisticated, research-backed strategies that meaningfully improve retrieval quality.

SynapseKit and LangChain both offer token-based and sentence-based splitting - adequate for most use cases, but not at LlamaIndex's depth.

If your application's quality depends on smart chunking, LlamaIndex is the right choice for the retrieval layer.

#10 - Built-in BM25: SynapseKit wins

BM25 is the backbone of lexical search and an essential half of any hybrid retrieval system. In SynapseKit, it's a core dependency - no extra install.

# SynapseKit - BM25 built in, zero extra pip
from synapsekit.retrievers import BM25Retriever

retriever = BM25Retriever(documents)
results = retriever.retrieve("machine learning transformers", k=5)

LangChain requires pip install rank-bm25 and additional wiring. LlamaIndex similarly requires an extra install. For a technique this fundamental to production RAG, burying it behind an extra install is a friction tax.

#11 - Hybrid Search (RRF Fusion): LangChain wins

Reciprocal Rank Fusion blends BM25 lexical scores and semantic embedding scores into a single ranked list - typically outperforming either alone by 5–15% on BEIR benchmarks.

LangChain's EnsembleRetriever is the cleanest API for this. SynapseKit supports hybrid retrieval but requires more manual wiring at present. Honest finding: LangChain wins this one.

#12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)

All three frameworks achieve sub-millisecond TTFT in a mock environment. The real differences are at the API layer, not the framework layer. But the streaming API ergonomics differ:

# SynapseKit - stream tokens as they arrive
async for token in llm.stream("Explain transformers in simple terms"):
print(token, end="", flush=True)

LangChain requires astream() on runnables. LlamaIndex requires a StreamingResponse wrapper. Small differences, but they accumulate across a codebase.

#13 - Conversation Memory: SynapseKit wins (clarity)

FrameworkAPITrimming strategy
SynapseKitConversationMemory(window=3)Turn-count sliding window
LangChainInMemoryChatMessageHistoryManual - stores everything, you trim
LlamaIndexChatMemoryBuffer.from_defaults(token_limit=500)Token-budget trimming

SynapseKit's window= parameter is the most beginner-friendly. LlamaIndex's token-budget approach is the most robust for production - especially when dealing with long tool outputs that blow up turn-count estimates.


Week 3: Agents & Tools

#15 - ReAct Agents: SynapseKit wins (3 lines vs 11)

# SynapseKit: 3 lines to a working ReAct agent
from synapsekit import ReActAgent
from synapsekit.tools import CalculatorTool, DateTimeTool

agent = ReActAgent(llm=llm, tools=[CalculatorTool(), DateTimeTool()], max_iterations=10)
result = await agent.run("What is 847 × 23, and what day is it today?")

SynapseKit: 3 lines. LangChain: 11 lines (requires create_react_agent + AgentExecutor + a prompt template from LangSmith hub). LlamaIndex: 9 lines.

#16 - Function Calling: SynapseKit wins (multi-provider schemas)

SynapseKit's BaseTool generates both OpenAI-format and Anthropic-format schemas from a single tool definition. Write a tool once, use it with any provider:

class WeatherTool(BaseTool):
name = "get_weather"
description = "Get the current weather for a city."
parameters = {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
}

async def run(self, city: str) -> str:
return f"Sunny, 22°C in {city}"

tool = WeatherTool()
tool.schema() # → OpenAI tools format
tool.anthropic_schema() # → Anthropic tool_use format

One tool definition. Zero vendor lock-in. Switch your LLM provider and your tools come with you.

#17 - Built-in Tool Libraries: SynapseKit wins by a wide margin

FrameworkBuilt-in toolsZero-config (no API key needed)
SynapseKit48 across 9 categories12
LangChain~17 core + communityMost need extra installs
LlamaIndex3 core wrappers3

SynapseKit's 9 tool categories - 48 tools ready to drop into any agent:

CategoryTools
SearchWebSearchTool, WikipediaTool, NewsSearchTool
MathCalculatorTool, StatisticsCalculatorTool, UnitConverterTool
Date/TimeDateTimeTool, TimezoneConverterTool, CalendarTool
Text ProcessingTextSummarizerTool, TextTranslatorTool, KeywordExtractorTool
File I/OFileReaderTool, FileWriterTool, CSVReaderTool, JSONParserTool
HTTPHTTPRequestTool, APIClientTool
Code ExecutionPythonREPLTool, ShellCommandTool
Data AnalysisDataFrameAnalyzerTool, ChartGeneratorTool
NLPSentimentAnalysisTool, NamedEntityRecognitionTool

With LangChain, getting a working tool usually means installing a community package, finding an API key, and reading a separate doc page. With SynapseKit, 12 tools work with zero configuration.

#18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)

from synapsekit import Crew, CrewAgent, Task

researcher = CrewAgent(
name="researcher", role="Research Analyst",
goal="Produce structured bullet points.", llm=llm
)
writer = CrewAgent(
name="writer", role="Content Writer",
goal="Turn bullet points into a polished paragraph.", llm=llm
)

tasks = [
Task(description=f"Research: {TOPIC}", agent="researcher",
expected_output="3–5 bullet points"),
Task(description="Write a paragraph from the research.", agent="writer",
context_from=["researcher"], expected_output="One paragraph"),
]

crew = Crew(agents=[researcher, writer], tasks=tasks, process="sequential")
result = await crew.run()

The context_from= parameter is the key insight: tasks declare their data dependencies declaratively. The framework handles execution order and context passing.

Orchestration pattern support:

PatternSynapseKitLangChainLlamaIndex
Sequential
Parallel
Supervisor
Handoff chain❌ (manual)
Graph / DAG✅ (LangGraph)
Shared state
Score6/65/63/6

LangChain's LangGraph is genuinely excellent for complex conditional workflows - if you need a state machine with branching logic, it's the right tool. SynapseKit's graph support handles the majority of production patterns with less ceremony.


Cumulative Scorecard (18 notebooks in)

FrameworkPointsCategory wins
SynapseKit3812 - cold start, dependencies, LoC, memory, provider switching, BM25, streaming ergonomics, memory clarity, ReAct agents, function calling, tools, multi-agent
LangChain223 - hybrid search RRF, LangGraph flexibility, error UX
LlamaIndex182 - chunking depth, token-budget memory

SynapseKit leads on developer ergonomics and batteries-included tooling. LangChain leads on complex graph orchestration. LlamaIndex leads on retrieval precision.


Architecture: What Makes SynapseKit Different

1. Async by default - not retrofitted

SynapseKit was designed async from the ground up. Every run(), every query(), every tool call returns a coroutine.

# Concurrent queries - not sequential
results = await asyncio.gather(
pipeline.query("What is the capital of France?"),
pipeline.query("Explain backpropagation in 2 sentences."),
pipeline.query("Summarise the attached PDF."),
)

In LangChain, async is available but not the default. Many features exist only in sync form and async was added later. The difference is subtle in a tutorial, significant in a production API.

2. Shallow call stack - your errors, not ours

When pipeline.query() breaks in LangChain, your traceback travels through Runnable, RunnableSequence, CallbackManager, BaseChain, and surfaces somewhere deep in the framework. You spend 10 minutes decoding the stack trace before you can begin debugging.

In SynapseKit, the call path is intentionally shallow. When something breaks, the traceback points at your code. No hidden middleware, no callback chains, no runnable wrappers unless you explicitly add them.

3. Unified tool interface - one definition, every provider

class BaseTool:
name: str
description: str
parameters: dict # JSON Schema

async def run(self, **kwargs) -> str: ...
def schema(self) -> dict: ... # OpenAI tools format
def anthropic_schema(self) -> dict: ... # Anthropic tool_use format

Write a tool once. It works with GPT-4o, Claude 3.5, Llama 3 on Groq, Gemini - any of the 31 supported providers. No adapter layer, no per-provider tool registration.

4. Task-centric multi-agent - separate what from who

SynapseKit's Crew model separates what to do (Task) from who does it (Agent). Tasks declare their dependencies via context_from. The framework handles execution order, context accumulation, and result passing.

Wiring data flow manually between agents is the source of most multi-agent bugs. When Agent B needs Agent A's output, you shouldn't write the plumbing; you should declare the dependency.

5. 43 loaders - data ingestion without hunting for packages

Production RAG applications ingest data from everywhere. SynapseKit ships 43 loaders:

  • Documents: PDF, EPUB, LaTeX, RTF, DOCX, Markdown, HTML
  • Data: CSV, TSV, JSON, XML, SQLite
  • Cloud: S3, Azure Blob, OneDrive, Dropbox
  • Databases: MongoDB, PostgreSQL
  • Config: .env, YAML, TOML
  • Web: sitemap crawlers, URL loaders, RSS feeds
  • Code: Python, JavaScript, TypeScript source files

One consistent Loader.load()List[Document] interface. Every loader returns the same type. Your downstream pipeline code never changes regardless of where the data comes from.

6. MCP Server support - Model Context Protocol built in

from synapsekit.mcp import MCPServer

server = MCPServer(name="my-tools", tools=[WeatherTool(), CalculatorTool()])
await server.run_sse(host="0.0.0.0", port=8080, bearer_token="secret")

Expose any tool as a production MCP endpoint in 3 lines. Compatible with any MCP-compliant client.


The Honest Take: When to Use Each

SynapseKit was built for a specific set of problems. It's not the right choice for every use case.

Use SynapseKit when:

  • You're building a greenfield LLM app and want the fastest path to production
  • Your app is async-first - APIs, webhooks, real-time applications, serverless
  • You need a small footprint - containers, Lambda, edge runtimes
  • You want batteries included without hunting for extra packages
  • Your pipeline uses standard patterns: ReAct agents, Crew orchestration, RAG, streaming
  • You're experimenting across providers and need painless switching
  • You want readable code that a new team member can understand without framework training

Use LangChain when:

  • You need complex conditional graph workflows - LangGraph is genuinely excellent at stateful, branching agentic pipelines
  • You need a specific integration from LangChain's 150+ partner ecosystem
  • Your team already knows LangChain deeply and migration cost outweighs gains
  • You need LangSmith observability deeply integrated into your debugging workflow

Use LlamaIndex when:

  • Advanced chunking is central to your application quality (SentenceWindow, Hierarchical - there's nothing equivalent in SynapseKit today)
  • You're building a knowledge-intensive system where retrieval precision is the primary metric
  • You want LLM-native evaluation metrics (faithfulness, relevance, groundedness) built into the framework

What's Coming in the Benchmark Series

The series continues through Notebooks #19–#30:

  • #19 - Observability & Tracing: What can you actually see when your agent runs?
  • #20 - Agent Error Handling: What happens when a tool throws an exception mid-loop?
  • #21 - Week 3 Scorecard: Agents & tools final rankings
  • #22 - Async Throughput: Requests/second under real concurrency
  • #23 - Graph Workflows: DAG pipelines for complex conditional flows
  • #24 - LLM Evaluation: Built-in faithfulness and relevance metrics
  • #25 - Cost Tracking: Token counting and spend visibility
  • #26 - Guardrails: Content filtering and output validation
  • #27 - MCP Support: Model Context Protocol in practice
  • #28 - Week 4 Scorecard
  • #29–#30 - Final Verdict: Which framework wins, for whom, and why

Follow the series on Kaggle


Quick Start

# Minimal install - 2 dependencies
pip install synapsekit

# Full install - vector search, all loaders, all tools
pip install "synapsekit[semantic]"
# Your first RAG pipeline in 7 lines
from synapsekit import RAGPipeline, LLMConfig
from synapsekit.llm.openai import OpenAILLM
from synapsekit.loaders import PDFLoader

llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key="sk-..."))
docs = PDFLoader("research.pdf").load()
pipeline = RAGPipeline(llm=llm)
pipeline.add_documents(docs)

answer = await pipeline.query("What are the main findings?")
print(answer)
# Your first multi-agent crew in 10 lines
from synapsekit import Crew, CrewAgent, Task
from synapsekit.llm.groq import GroqLLM

llm = GroqLLM(LLMConfig(model="llama-3-8b-8192", api_key="gsk-..."))
researcher = CrewAgent(name="researcher", role="Research Analyst", llm=llm)
writer = CrewAgent(name="writer", role="Writer", llm=llm)
tasks = [
Task(description="Research quantum computing trends", agent="researcher"),
Task(description="Write a blog intro", agent="writer", context_from=["researcher"]),
]
result = await Crew(agents=[researcher, writer], tasks=tasks).run()

Links:

Every benchmark is reproducible. Fork any notebook and run it on Kaggle free CPU. If the results differ in your environment, open an issue.

Engineers of AI

Read more: www.engineersofai.com

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.