Streaming RAG — TTFT Distribution and API Surface
50 runs per framework with a mock LLM. Framework overhead is sub-millisecond across the board. The meaningful differences live in the API surface, not the latency number.
TTFT Distribution — 50 runs, mock LLM
All three frameworks live below 0.3 ms on a mock LLM. SynapseKit has the tightest distribution. LlamaIndex has the widest p99 tail. None of this matters in production — network latency dwarfs it all by 1,000x.
Streaming API Surface Matrix
| Feature | SynapseKit | LangChain | LlamaIndex |
| Primary API | async gen | sync + async gen | sync gen |
Sync .stream() | No | Yes | Yes |
Async .astream() | Yes | Yes | Partial |
| Stream on RAG object | Yes | Yes (LCEL) | Yes (flag) |
| Callback handlers | No | Yes | Manager |
| Works in sync runtimes | No | Yes | Yes |
| Works in async runtimes | Yes | Yes | Partial |
LangChain is the most flexible runtime. Sync + async + callbacks lets you build a Streamlit app, a CLI, and a FastAPI endpoint from the same chain. SynapseKit is async-only. LlamaIndex is sync-first.
async only
SynapseKit
Single-method RAG stream, no sync path
sync + async
LangChain
Most flexible — both modes + callbacks
sync first
LlamaIndex
Clean sync API, weak async story
www.engineersofai.com · AI Letters #21 · LLM Showdown #12