Streaming RAG TTFT — Framework Overhead vs Reality

Pure framework overhead is sub-millisecond. Real TTFT is dominated by the network round-trip and the provider's first-token time. This chart shows why you shouldn't tune the framework.

SynapseKit

LangChain

LlamaIndex

Network + Provider

TTFT Breakdown — Framework vs Everything Else (log scale)

Framework overhead is a rounding error. Even the "slowest" framework (LlamaIndex at 0.14ms) is 1,000x faster than a typical LLM provider first-token time (150–600ms). If you're profiling TTFT and looking at your framework, you're looking in the wrong place.

Framework Overhead — Sub-millisecond Zoom (mock LLM only)

This chart is the one nobody should optimise. These are microsecond differences on a mock LLM with zero network latency. In production, jitter on a single network packet is larger than the gap between the fastest and slowest framework here.

0.08 ms
SynapseKit
median TTFT (mock LLM)
0.12 ms
LangChain
median TTFT (mock LLM)
0.14 ms
LlamaIndex
median TTFT (mock LLM)