Pure vector search misses exact matches. Pure BM25 misses semantics. Hybrid search almost always wins - the question is how much control you get over the fusion.
Every production RAG system eventually hits the same wall. Vector search retrieves semantically similar documents, but it fails on exact-match queries: model names, version numbers, function names, error codes. The query "GPT-4o" and the document "GPT-4o" don't reliably produce close vectors. BM25 doesn't have this problem. It matches terms, weighs them by rarity, and returns the right document.
Reciprocal Rank Fusion - RRF - is the standard way to combine both. It takes two ranked lists, assigns each document a score of 1 / (k + rank), sums the scores, and re-ranks. The parameter k controls how much the top ranks dominate. It requires no score normalisation, works across retrieval algorithms with incompatible score scales, and runs in microseconds.
We built identical hybrid pipelines across SynapseKit 1.4, LangChain 1.2, and LlamaIndex Core 0.14. Same corpus, same query, same task: BM25 + vector, top-3 via RRF. The LoC gap is smaller than the BM25-only benchmark. The configurability gap is not.