Streaming RAG — TTFT Distribution and API Surface

50 runs per framework with a mock LLM. Framework overhead is sub-millisecond across the board. The meaningful differences live in the API surface, not the latency number.

SynapseKit
LangChain
LlamaIndex
TTFT Distribution — 50 runs, mock LLM
All three frameworks live below 0.3 ms on a mock LLM. SynapseKit has the tightest distribution. LlamaIndex has the widest p99 tail. None of this matters in production — network latency dwarfs it all by 1,000x.
Median vs p99 TTFT (ms)
Streaming API Surface Matrix
FeatureSynapseKitLangChainLlamaIndex
Primary APIasync gensync + async gensync gen
Sync .stream()NoYesYes
Async .astream()YesYesPartial
Stream on RAG objectYesYes (LCEL)Yes (flag)
Callback handlersNoYesManager
Works in sync runtimesNoYesYes
Works in async runtimesYesYesPartial
LangChain is the most flexible runtime. Sync + async + callbacks lets you build a Streamlit app, a CLI, and a FastAPI endpoint from the same chain. SynapseKit is async-only. LlamaIndex is sync-first.
async only
SynapseKit
Single-method RAG stream, no sync path
sync + async
LangChain
Most flexible — both modes + callbacks
sync first
LlamaIndex
Clean sync API, weak async story
www.engineersofai.com · AI Letters #21 · LLM Showdown #12