Streaming RAG — TTFT Distribution and API Surface

50 runs per framework with a mock LLM. Framework overhead is sub-millisecond across the board. The meaningful differences live in the API surface, not the latency number.

SynapseKit

LangChain

LlamaIndex

TTFT Distribution — 50 runs, mock LLM

All three frameworks live below 0.3 ms on a mock LLM. SynapseKit has the tightest distribution. LlamaIndex has the widest p99 tail. None of this matters in production — network latency dwarfs it all by 1,000x.

Median vs p99 TTFT (ms)

Streaming API Surface Matrix

Feature	SynapseKit	LangChain	LlamaIndex
Primary API	async gen	sync + async gen	sync gen
Sync `.stream()`	No	Yes	Yes
Async `.astream()`	Yes	Yes	Partial
Stream on RAG object	Yes	Yes (LCEL)	Yes (flag)
Callback handlers	No	Yes	Manager
Works in sync runtimes	No	Yes	Yes
Works in async runtimes	Yes	Yes	Partial

LangChain is the most flexible runtime. Sync + async + callbacks lets you build a Streamlit app, a CLI, and a FastAPI endpoint from the same chain. SynapseKit is async-only. LlamaIndex is sync-first.

async only
SynapseKit
Single-method RAG stream, no sync path
sync + async
LangChain
Most flexible — both modes + callbacks
sync first
LlamaIndex
Clean sync API, weak async story