I Built a Lightweight LLM Framework Because LangChain Frustrated Me - Here's What I Learned
There's a moment every LLM developer knows. You've got a working prototype. It's elegant, fast, and does exactly what you need. Then you try to deploy it. And suddenly you're debugging a chain inside a runnable inside a callback inside an abstraction that didn't exist six months ago.
That moment happened one too many times. So something else got built.
This is the story of SynapseKit - why it exists, what it does differently, and what 18 (and counting) objective benchmarks against LangChain and LlamaIndex actually revealed.
The Problem With "The Standard"
Every developer building LLM-powered applications today reaches for the same toolkit: LangChain or LlamaIndex. They're powerful, well-documented, and have massive communities. They're also, frankly, a pain to work with day-to-day.
Not bad. Just built for different goals.
LangChain's philosophy is maximum flexibility: there's an abstraction for everything, a chain for every use case, and 87 packages you can bolt on. It's impressive engineering. It's also a framework that treats simple tasks like they're distributed systems problems.
LlamaIndex's philosophy is data ingestion depth: best-in-class chunking, indexing, and retrieval. If your application lives and dies by retrieval precision, LlamaIndex is serious software. But you pay for that depth in complexity.
Both are solving real problems. But neither optimises for the thing that matters most when building production LLM systems:
How fast can I go from idea to working code, and how readable is that code six months later?
After the fifth time debugging a LangChain stack trace that pointed three abstraction layers away from the actual code, SynapseKit started getting written.
What Is SynapseKit?
SynapseKit is an async-first Python framework for building RAG pipelines, LLM agents, and multi-agent systems. It ships with:
- 31 LLM providers - OpenAI, Anthropic, Groq, Mistral, Gemini, Ollama, LMStudio, xAI, Novita, Writer, and 21 more
- 48 built-in tools - search, math, file I/O, HTTP, code execution, NLP, data analysis, and more
- 43 document loaders - PDF, EPUB, LaTeX, RTF, TSV, S3, Azure Blob, MongoDB, Dropbox, OneDrive, and more
- MCP server support - SSE transport with Bearer auth for Model Context Protocol
- Multi-agent primitives - ReActAgent, Crew/CrewAgent/Task, graph-based workflows, recursive subgraphs
pip install "synapsekit[semantic]"
The base install has 2 dependencies. The full semantic install - vector search, all loaders, all tools - pulls in 14 packages. LangChain installs 67. That's not a rounding error; it's a design philosophy.
synapsekit → 2 deps | ~48 MB RAM | ~80ms startup
synapsekit[semantic] → 14 deps |
langchain → 67 deps | ~189 MB RAM | ~2.4s startup
llama-index-core → 43 deps | ~112 MB RAM | ~1.1s startup
The 30-Benchmark Series
Rather than writing a marketing post, a 30-notebook benchmark series was run on Kaggle comparing SynapseKit to LangChain 0.3 and LlamaIndex Core 0.12. One measurable dimension per notebook. Every notebook runs end-to-end on Kaggle free CPU. Results reported honestly - including when SynapseKit loses.
Follow the full series: kaggle.com/discussions/general/688339
Here's everything found so far.
Week 1: Developer Experience
#1 - Cold Start: SynapseKit wins by 30×
The first thing you notice when you import a framework is the wait. For Lambda functions, FastAPI startup, or any process that imports on every cold start, this compounds fast.
import time
t = time.perf_counter()
import synapsekit
print(f"SynapseKit: {time.perf_counter() - t:.3f}s") # 0.082s
t = time.perf_counter()
import langchain
print(f"LangChain: {time.perf_counter() - t:.3f}s") # 2.41s
t = time.perf_counter()
import llama_index
print(f"LlamaIndex: {time.perf_counter() - t:.3f}s") # 1.08s
SynapseKit: ~80ms. LangChain: ~2.4s. LlamaIndex: ~1.1s.
At 1,000 cold starts per day - realistic for a mid-traffic serverless API - LangChain burns 40 minutes of pure overhead. SynapseKit burns 1.3 minutes. In AWS Lambda terms, that's real money.
#2 - Dependency Count: SynapseKit wins by 33×
| Framework | Base install | Full install |
|---|---|---|
| SynapseKit | 2 packages | 14 packages |
| LlamaIndex Core | 43 packages | 70+ packages |
| LangChain | 67 packages | 100+ packages |
Fewer dependencies means faster installs, smaller container images, fewer CVE surface, and less pip freeze archaeology when something breaks.
#3 - Hello RAG: SynapseKit wins (fewest lines)
The same RAG pipeline - load documents, embed, retrieve, answer - across three frameworks:
# SynapseKit: 7 functional lines
from synapsekit import RAGPipeline, LLMConfig
from synapsekit.llm.openai import OpenAILLM
llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key=KEY))
pipeline = RAGPipeline(llm=llm)
pipeline.add_documents(docs)
answer = await pipeline.query("What is RAG?")
# LangChain: 14 functional lines
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub
llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
chain = ({"context": retriever, "question": RunnablePassthrough()}
| prompt | llm | StrOutputParser())
answer = chain.invoke("What is RAG?")
SynapseKit: 7 lines. LangChain: 14 lines. LlamaIndex: 11 lines.
This isn't code golf. Fewer lines means fewer places for bugs to hide, fewer things for a new team member to learn, and faster iteration. The LangChain version requires knowing what a runnable is, what hub.pull does, and why RunnablePassthrough is needed. The SynapseKit version is self-explanatory.
#4 - Memory Footprint: SynapseKit wins by 4×
| Framework | RSS at import |
|---|---|
| SynapseKit | 48 MB |
| LlamaIndex | 112 MB |
| LangChain | 189 MB |
At 10 replicas, LangChain costs ~1.4 GB just in framework overhead. SynapseKit costs ~480 MB. For containerised deployments where you're paying per GB of memory, that difference compounds fast.
#5 - Provider Switching: SynapseKit wins (2 lines changed)
One of the most common tasks in LLM development is experimenting across providers. How many lines change when you swap from OpenAI to Groq to Ollama?
# SynapseKit - change 1 import + 1 config line
from synapsekit.llm.openai import OpenAILLM
llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key=OPENAI_KEY))
from synapsekit.llm.groq import GroqLLM
llm = GroqLLM(LLMConfig(model="llama-3-8b-8192", api_key=GROQ_KEY))
from synapsekit.llm.ollama import OllamaLLM
llm = OllamaLLM(LLMConfig(model="llama3"))
# Everything downstream: unchanged.
SynapseKit: 2 lines. LangChain: 4–6 lines. LlamaIndex: 3–4 lines.
31 providers, all following the same LLMConfig pattern. Switching from a paid API to a local model for development takes 10 seconds.
Week 2: RAG Pipelines
#8 - PDF Ingestion: All close
All three frameworks can index a PDF in under 10 lines. This one's effectively a draw - SynapseKit slightly more concise but the gap is small.
#9 - Chunking Strategies: LlamaIndex wins
This is where LlamaIndex genuinely excels.
LlamaIndex ships 9+ built-in splitters including SentenceWindowNodeParser (adds surrounding context sentences to each chunk) and HierarchicalNodeParser (creates parent-child chunk trees for better retrieval). These are sophisticated, research-backed strategies that meaningfully improve retrieval quality.
SynapseKit and LangChain both offer token-based and sentence-based splitting - adequate for most use cases, but not at LlamaIndex's depth.
If your application's quality depends on smart chunking, LlamaIndex is the right choice for the retrieval layer.
#10 - Built-in BM25: SynapseKit wins
BM25 is the backbone of lexical search and an essential half of any hybrid retrieval system. In SynapseKit, it's a core dependency - no extra install.
# SynapseKit - BM25 built in, zero extra pip
from synapsekit.retrievers import BM25Retriever
retriever = BM25Retriever(documents)
results = retriever.retrieve("machine learning transformers", k=5)
LangChain requires pip install rank-bm25 and additional wiring. LlamaIndex similarly requires an extra install. For a technique this fundamental to production RAG, burying it behind an extra install is a friction tax.
#11 - Hybrid Search (RRF Fusion): LangChain wins
Reciprocal Rank Fusion blends BM25 lexical scores and semantic embedding scores into a single ranked list - typically outperforming either alone by 5–15% on BEIR benchmarks.
LangChain's EnsembleRetriever is the cleanest API for this. SynapseKit supports hybrid retrieval but requires more manual wiring at present. Honest finding: LangChain wins this one.
#12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)
All three frameworks achieve sub-millisecond TTFT in a mock environment. The real differences are at the API layer, not the framework layer. But the streaming API ergonomics differ:
# SynapseKit - stream tokens as they arrive
async for token in llm.stream("Explain transformers in simple terms"):
print(token, end="", flush=True)
LangChain requires astream() on runnables. LlamaIndex requires a StreamingResponse wrapper. Small differences, but they accumulate across a codebase.
#13 - Conversation Memory: SynapseKit wins (clarity)
| Framework | API | Trimming strategy |
|---|---|---|
| SynapseKit | ConversationMemory(window=3) | Turn-count sliding window |
| LangChain | InMemoryChatMessageHistory | Manual - stores everything, you trim |
| LlamaIndex | ChatMemoryBuffer.from_defaults(token_limit=500) | Token-budget trimming |
SynapseKit's window= parameter is the most beginner-friendly. LlamaIndex's token-budget approach is the most robust for production - especially when dealing with long tool outputs that blow up turn-count estimates.
Week 3: Agents & Tools
#15 - ReAct Agents: SynapseKit wins (3 lines vs 11)
# SynapseKit: 3 lines to a working ReAct agent
from synapsekit import ReActAgent
from synapsekit.tools import CalculatorTool, DateTimeTool
agent = ReActAgent(llm=llm, tools=[CalculatorTool(), DateTimeTool()], max_iterations=10)
result = await agent.run("What is 847 × 23, and what day is it today?")
SynapseKit: 3 lines. LangChain: 11 lines (requires create_react_agent + AgentExecutor + a prompt template from LangSmith hub). LlamaIndex: 9 lines.
#16 - Function Calling: SynapseKit wins (multi-provider schemas)
SynapseKit's BaseTool generates both OpenAI-format and Anthropic-format schemas from a single tool definition. Write a tool once, use it with any provider:
class WeatherTool(BaseTool):
name = "get_weather"
description = "Get the current weather for a city."
parameters = {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
}
async def run(self, city: str) -> str:
return f"Sunny, 22°C in {city}"
tool = WeatherTool()
tool.schema() # → OpenAI tools format
tool.anthropic_schema() # → Anthropic tool_use format
One tool definition. Zero vendor lock-in. Switch your LLM provider and your tools come with you.
#17 - Built-in Tool Libraries: SynapseKit wins by a wide margin
| Framework | Built-in tools | Zero-config (no API key needed) |
|---|---|---|
| SynapseKit | 48 across 9 categories | 12 |
| LangChain | ~17 core + community | Most need extra installs |
| LlamaIndex | 3 core wrappers | 3 |
SynapseKit's 9 tool categories - 48 tools ready to drop into any agent:
| Category | Tools |
|---|---|
| Search | WebSearchTool, WikipediaTool, NewsSearchTool |
| Math | CalculatorTool, StatisticsCalculatorTool, UnitConverterTool |
| Date/Time | DateTimeTool, TimezoneConverterTool, CalendarTool |
| Text Processing | TextSummarizerTool, TextTranslatorTool, KeywordExtractorTool |
| File I/O | FileReaderTool, FileWriterTool, CSVReaderTool, JSONParserTool |
| HTTP | HTTPRequestTool, APIClientTool |
| Code Execution | PythonREPLTool, ShellCommandTool |
| Data Analysis | DataFrameAnalyzerTool, ChartGeneratorTool |
| NLP | SentimentAnalysisTool, NamedEntityRecognitionTool |
With LangChain, getting a working tool usually means installing a community package, finding an API key, and reading a separate doc page. With SynapseKit, 12 tools work with zero configuration.
#18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)
from synapsekit import Crew, CrewAgent, Task
researcher = CrewAgent(
name="researcher", role="Research Analyst",
goal="Produce structured bullet points.", llm=llm
)
writer = CrewAgent(
name="writer", role="Content Writer",
goal="Turn bullet points into a polished paragraph.", llm=llm
)
tasks = [
Task(description=f"Research: {TOPIC}", agent="researcher",
expected_output="3–5 bullet points"),
Task(description="Write a paragraph from the research.", agent="writer",
context_from=["researcher"], expected_output="One paragraph"),
]
crew = Crew(agents=[researcher, writer], tasks=tasks, process="sequential")
result = await crew.run()
The context_from= parameter is the key insight: tasks declare their data dependencies declaratively. The framework handles execution order and context passing.
Orchestration pattern support:
| Pattern | SynapseKit | LangChain | LlamaIndex |
|---|---|---|---|
| Sequential | ✅ | ✅ | ✅ |
| Parallel | ✅ | ✅ | ❌ |
| Supervisor | ✅ | ✅ | ❌ |
| Handoff chain | ✅ | ❌ (manual) | ✅ |
| Graph / DAG | ✅ | ✅ (LangGraph) | ❌ |
| Shared state | ✅ | ✅ | ✅ |
| Score | 6/6 | 5/6 | 3/6 |
LangChain's LangGraph is genuinely excellent for complex conditional workflows - if you need a state machine with branching logic, it's the right tool. SynapseKit's graph support handles the majority of production patterns with less ceremony.
Cumulative Scorecard (18 notebooks in)
| Framework | Points | Category wins |
|---|---|---|
| SynapseKit | 38 | 12 - cold start, dependencies, LoC, memory, provider switching, BM25, streaming ergonomics, memory clarity, ReAct agents, function calling, tools, multi-agent |
| LangChain | 22 | 3 - hybrid search RRF, LangGraph flexibility, error UX |
| LlamaIndex | 18 | 2 - chunking depth, token-budget memory |
SynapseKit leads on developer ergonomics and batteries-included tooling. LangChain leads on complex graph orchestration. LlamaIndex leads on retrieval precision.
Architecture: What Makes SynapseKit Different
1. Async by default - not retrofitted
SynapseKit was designed async from the ground up. Every run(), every query(), every tool call returns a coroutine.
# Concurrent queries - not sequential
results = await asyncio.gather(
pipeline.query("What is the capital of France?"),
pipeline.query("Explain backpropagation in 2 sentences."),
pipeline.query("Summarise the attached PDF."),
)
In LangChain, async is available but not the default. Many features exist only in sync form and async was added later. The difference is subtle in a tutorial, significant in a production API.
2. Shallow call stack - your errors, not ours
When pipeline.query() breaks in LangChain, your traceback travels through Runnable, RunnableSequence, CallbackManager, BaseChain, and surfaces somewhere deep in the framework. You spend 10 minutes decoding the stack trace before you can begin debugging.
In SynapseKit, the call path is intentionally shallow. When something breaks, the traceback points at your code. No hidden middleware, no callback chains, no runnable wrappers unless you explicitly add them.
3. Unified tool interface - one definition, every provider
class BaseTool:
name: str
description: str
parameters: dict # JSON Schema
async def run(self, **kwargs) -> str: ...
def schema(self) -> dict: ... # OpenAI tools format
def anthropic_schema(self) -> dict: ... # Anthropic tool_use format
Write a tool once. It works with GPT-4o, Claude 3.5, Llama 3 on Groq, Gemini - any of the 31 supported providers. No adapter layer, no per-provider tool registration.
4. Task-centric multi-agent - separate what from who
SynapseKit's Crew model separates what to do (Task) from who does it (Agent). Tasks declare their dependencies via context_from. The framework handles execution order, context accumulation, and result passing.
Wiring data flow manually between agents is the source of most multi-agent bugs. When Agent B needs Agent A's output, you shouldn't write the plumbing; you should declare the dependency.
5. 43 loaders - data ingestion without hunting for packages
Production RAG applications ingest data from everywhere. SynapseKit ships 43 loaders:
- Documents: PDF, EPUB, LaTeX, RTF, DOCX, Markdown, HTML
- Data: CSV, TSV, JSON, XML, SQLite
- Cloud: S3, Azure Blob, OneDrive, Dropbox
- Databases: MongoDB, PostgreSQL
- Config: .env, YAML, TOML
- Web: sitemap crawlers, URL loaders, RSS feeds
- Code: Python, JavaScript, TypeScript source files
One consistent Loader.load() → List[Document] interface. Every loader returns the same type. Your downstream pipeline code never changes regardless of where the data comes from.
6. MCP Server support - Model Context Protocol built in
from synapsekit.mcp import MCPServer
server = MCPServer(name="my-tools", tools=[WeatherTool(), CalculatorTool()])
await server.run_sse(host="0.0.0.0", port=8080, bearer_token="secret")
Expose any tool as a production MCP endpoint in 3 lines. Compatible with any MCP-compliant client.
The Honest Take: When to Use Each
SynapseKit was built for a specific set of problems. It's not the right choice for every use case.
Use SynapseKit when:
- You're building a greenfield LLM app and want the fastest path to production
- Your app is async-first - APIs, webhooks, real-time applications, serverless
- You need a small footprint - containers, Lambda, edge runtimes
- You want batteries included without hunting for extra packages
- Your pipeline uses standard patterns: ReAct agents, Crew orchestration, RAG, streaming
- You're experimenting across providers and need painless switching
- You want readable code that a new team member can understand without framework training
Use LangChain when:
- You need complex conditional graph workflows - LangGraph is genuinely excellent at stateful, branching agentic pipelines
- You need a specific integration from LangChain's 150+ partner ecosystem
- Your team already knows LangChain deeply and migration cost outweighs gains
- You need LangSmith observability deeply integrated into your debugging workflow
Use LlamaIndex when:
- Advanced chunking is central to your application quality (
SentenceWindow,Hierarchical- there's nothing equivalent in SynapseKit today) - You're building a knowledge-intensive system where retrieval precision is the primary metric
- You want LLM-native evaluation metrics (faithfulness, relevance, groundedness) built into the framework
What's Coming in the Benchmark Series
The series continues through Notebooks #19–#30:
- #19 - Observability & Tracing: What can you actually see when your agent runs?
- #20 - Agent Error Handling: What happens when a tool throws an exception mid-loop?
- #21 - Week 3 Scorecard: Agents & tools final rankings
- #22 - Async Throughput: Requests/second under real concurrency
- #23 - Graph Workflows: DAG pipelines for complex conditional flows
- #24 - LLM Evaluation: Built-in faithfulness and relevance metrics
- #25 - Cost Tracking: Token counting and spend visibility
- #26 - Guardrails: Content filtering and output validation
- #27 - MCP Support: Model Context Protocol in practice
- #28 - Week 4 Scorecard
- #29–#30 - Final Verdict: Which framework wins, for whom, and why
Quick Start
# Minimal install - 2 dependencies
pip install synapsekit
# Full install - vector search, all loaders, all tools
pip install "synapsekit[semantic]"
# Your first RAG pipeline in 7 lines
from synapsekit import RAGPipeline, LLMConfig
from synapsekit.llm.openai import OpenAILLM
from synapsekit.loaders import PDFLoader
llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key="sk-..."))
docs = PDFLoader("research.pdf").load()
pipeline = RAGPipeline(llm=llm)
pipeline.add_documents(docs)
answer = await pipeline.query("What are the main findings?")
print(answer)
# Your first multi-agent crew in 10 lines
from synapsekit import Crew, CrewAgent, Task
from synapsekit.llm.groq import GroqLLM
llm = GroqLLM(LLMConfig(model="llama-3-8b-8192", api_key="gsk-..."))
researcher = CrewAgent(name="researcher", role="Research Analyst", llm=llm)
writer = CrewAgent(name="writer", role="Writer", llm=llm)
tasks = [
Task(description="Research quantum computing trends", agent="researcher"),
Task(description="Write a blog intro", agent="writer", context_from=["researcher"]),
]
result = await Crew(agents=[researcher, writer], tasks=tasks).run()
Links:
- GitHub: github.com/SynapseKit/SynapseKit
- Docs: synapsekit.github.io/synapsekit-docs
- Kaggle benchmark series: kaggle.com/discussions/general/688339
Every benchmark is reproducible. Fork any notebook and run it on Kaggle free CPU. If the results differ in your environment, open an issue.
Engineers of AI
Read more: www.engineersofai.com
