What It Stores
Task records: the original goal, the sequence of steps taken, tool calls made, observations from each step, final conclusion, outcome (success/partial/failed), duration, timestamp. Think of it as a structured log of the agent's autobiographical history — what it experienced, not just what it knows.
Read / Write Pattern
| Operation | How |
| Write | At task completion — store full episode record as vector + JSON metadata |
| Read | At task start — query by task similarity, inject "past experience" into system prompt |
| Filter | By outcome (only successful), by date (recency), by user (multi-tenant) |
Latency Profile
Only queried once per task (at start), not in the inner loop. Latency is not the bottleneck here.
Code Pattern
past = episode_memory.query(
query_texts=[current_task],
n_results=3,
where={"outcome": {"$in": ["success", "partial"]}}
)
episode_memory.upsert(
documents=[f"Task: {task}\nConclusion: {conclusion}"],
metadatas=[{"steps_json": json.dumps(steps),
"outcome": outcome}],
ids=[episode_id]
)
Best Use Case
Long-running agent systems that handle recurring task types. Research agents that should not re-analyze documents they've already processed. Customer support agents that should remember past interactions. Any system where "I've done something like this before" is a useful signal for strategy selection.
Failure Mode
Stale episodes contaminate strategy. An agent that failed at a task 6 months ago retrieves that failure context and applies the same failed approach again. Or worse: a successful episode from when an API worked differently is retrieved and followed, causing silent errors. Expire episodes aggressively. Weight recency heavily in retrieval scoring.