Conversation Memory in RAG — Code Side by Side

Click a tab to see the full multi-turn memory pipeline for each framework. Same task: 5 turns of conversation, memory injected into every RAG query.

SynapseKit
LlamaIndex
LangChain
SynapseKit 1.4
6 lines
1 import
turn-count window
in-memory only
from synapsekit import RAG

rag = RAG(model="gpt-4o-mini", api_key=KEY, memory_window=5)
await rag.add_documents(DOCS)
r1 = await rag.ask("What is RAG?")
r2 = await rag.ask("How does it improve accuracy?")
r3 = await rag.ask("Which retrieval method is fastest?")
Key move: memory_window=5 is a single constructor argument. Every subsequent .ask() automatically prepends the last 5 turns to the retrieved context. Zero additional setup. Limitation: in-memory only — no persistence across sessions or restarts. Best for single-user, single-session chatbots.
LlamaIndex Core 0.14
9 lines
3 imports
token-budget window
JSON persist
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini")
index  = VectorStoreIndex.from_documents([Document(text=d) for d in DOCS])
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
engine = index.as_chat_engine(memory=memory, chat_mode="context")
r1 = engine.chat("What is RAG?")
r2 = engine.chat("How does it improve accuracy?")
r3 = engine.chat("Which retrieval method is fastest?")
Key move: ChatMemoryBuffer.from_defaults(token_limit=1500) creates a token-budget buffer. The engine drops old messages when it exceeds the limit — more predictable prompt sizes than turn-count windows. Persistence: can serialize to SimpleChatStore (JSON). No Redis/Postgres built-in.
LangChain 1.2
17 lines
5 imports
turn-count window
Redis/DB backends
from langchain_community.retrievers import BM25Retriever
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory

store = {}
def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

retriever = BM25Retriever.from_texts(DOCS, k=3)
prompt = ChatPromptTemplate.from_messages([
    ("system", "Context: {ctx}"),
    MessagesPlaceholder("history"),
    ("human", "{question}"),
])
chain = ({"ctx": retriever, "question": RunnablePassthrough()} | prompt | ChatOpenAI())
chain_with_history = RunnableWithMessageHistory(
    chain, get_session_history,
    input_messages_key="question", history_messages_key="history"
)
cfg = {"configurable": {"session_id": "s1"}}
r1 = chain_with_history.invoke({"question": "What is RAG?"}, config=cfg)
Key move: RunnableWithMessageHistory wraps any LCEL chain with pluggable session storage. The payoff: swap InMemoryChatMessageHistory for RedisChatMessageHistory and you get production-grade multi-user persistence — one line change. Worth the 17-line setup cost if you're building for real users.
www.engineersofai.com · AI Letters #22 · LLM Showdown #13