:::tip 🎮 Interactive Playground Visualize this concept: Try the Memory Systems demo on the EngineersOfAI Playground - no code required. :::
Personalisation and Memory
The Amnesia Problem
A senior DevOps engineer at a fintech startup used an AI assistant every day for three months. In that time, she had told the AI that her team used AWS (not GCP), that they deployed with Kubernetes, that she preferred code examples in Python, that she wanted concise answers without lengthy preambles, and that her company had strict SOC2 compliance requirements. She said each of these things exactly once, because the AI asked or because context made it relevant.
Three months and hundreds of conversations later, the AI still knew nothing about her. Every new session began from scratch. The AI would suggest GCP solutions. It would write code in JavaScript. It would open with three-paragraph introductions. It would recommend deployment patterns that violated SOC2. Every correction was repeated. Every preference re-explained. The time cost was substantial. The emotional cost - the feeling of repeatedly training an intern who keeps forgetting everything between shifts - was worse.
When the product finally shipped a memory feature, her reaction was not delight. It was relief. The relief of not having to repeat herself. Of having the AI reference her actual environment without prompting. Of getting code examples in the right language without specifying it. Of feeling like a collaborator rather than a configuration manager. That reaction - relief rather than delight - tells you how table-stakes memory has become. Users no longer see it as a feature. They see its absence as a product defect.
This lesson covers the full memory architecture: what to remember, how to store it efficiently, how to retrieve it, and how to give users control over their AI's memory while maintaining privacy and compliance.
Memory Taxonomy
Not all memory is the same. Different types serve different purposes, have different storage requirements, and are retrieved differently.
| Memory Type | Examples | Storage | Retrieval Strategy | TTL |
|---|---|---|---|---|
| In-context | Current conversation | Token window (ephemeral) | Direct - it's in the prompt | Session |
| Semantic | "Prefers Python," "uses AWS," "role: DevOps" | Relational DB | Load all on session start | 6-12 months for volatile facts |
| Episodic | "We debugged a Kubernetes pod restart loop" | Vector DB | Similarity search on query | 12+ months |
| Procedural | "Always wants code first, explanation second" | Relational DB | Load all on session start | Permanent until changed |
Semantic Memory: User Preferences and Facts
Semantic memory is the highest-value memory type. It stores structured facts about the user that should influence every response. Loading it takes milliseconds (simple key-value lookup), and its impact is immediate - the AI behaves differently from the first message.
# memory/semantic_memory.py
from dataclasses import dataclass, field
from typing import Optional, Any
from enum import Enum
import time
import uuid
import json
import anthropic
import asyncio
class MemoryCategory(str, Enum):
ROLE = "role" # Job title, company, team
TECHNICAL = "technical" # Language, tools, cloud, frameworks
PREFERENCE = "preference" # Communication style, format, detail level
CONTEXT = "context" # Current project, immediate situation
PERSONAL = "personal" # Timezone, location (only if relevant)
CONSTRAINT = "constraint" # Compliance, access limits, policies
class MemorySource(str, Enum):
EXPLICIT = "explicit" # User directly stated: "I use Python"
INFERRED = "inferred" # Detected from behavior: asked Python questions 5 times
CORRECTED = "corrected" # User updated AI's wrong assumption
@dataclass
class MemoryFact:
fact_id: str
user_id: str
category: MemoryCategory
key: str # Machine-readable: "preferred_language", "cloud_provider"
value: str # Human-readable: "Python 3.11", "AWS"
display_name: str # UI label: "Preferred language", "Cloud provider"
confidence: float # 0.0-1.0 (explicit = 1.0, inferred decreases with uncertainty)
source: MemorySource
created_at: float = field(default_factory=time.time)
updated_at: float = field(default_factory=time.time)
expires_at: Optional[float] = None # None = permanent; use for volatile facts
usage_count: int = 0 # How many times this fact was injected into context
@dataclass
class UserMemoryProfile:
user_id: str
facts: list[MemoryFact] = field(default_factory=list)
last_loaded_at: Optional[float] = None
def to_context_string(self) -> str:
"""Format facts as system prompt context injection."""
if not self.facts:
return ""
by_category: dict[str, list[MemoryFact]] = {}
for fact in self.facts:
by_category.setdefault(fact.category.value, []).append(fact)
sections = ["## About This User\n"]
sections.append("Use this context to personalize responses. Don't repeat these facts back to the user - just use them naturally.\n")
category_labels = {
"role": "Role and context",
"technical": "Technical environment",
"preference": "Communication preferences",
"constraint": "Important constraints",
"context": "Current situation",
}
for cat, label in category_labels.items():
if cat in by_category:
sections.append(f"\n**{label}:**")
for fact in by_category[cat]:
prefix = "" if fact.source == MemorySource.EXPLICIT else "(inferred) "
sections.append(f"- {fact.display_name}: {prefix}{fact.value}")
return "\n".join(sections)
def get_fact(self, key: str) -> Optional[MemoryFact]:
return next((f for f in self.facts if f.key == key), None)
def upsert_fact(self, fact: MemoryFact) -> None:
existing = self.get_fact(fact.key)
if existing:
self.facts.remove(existing)
self.facts.append(fact)
class SemanticMemoryStore:
"""
Persistent storage for user semantic memory.
Backed by a relational database (PostgreSQL in production).
Performance characteristics:
- Load: ~10-50ms (simple query by user_id)
- Write: ~5-20ms (single row upsert)
- Cache: in-process cache per request lifetime
"""
def __init__(self, db=None):
self.db = db
self._request_cache: dict[str, UserMemoryProfile] = {}
async def load_profile(self, user_id: str) -> UserMemoryProfile:
"""Load all active facts for a user."""
if user_id in self._request_cache:
return self._request_cache[user_id]
# In production:
# rows = await db.fetch(
# """SELECT * FROM memory_facts
# WHERE user_id = $1
# AND (expires_at IS NULL OR expires_at > NOW())
# ORDER BY updated_at DESC""",
# user_id
# )
# facts = [MemoryFact(**row) for row in rows]
facts = [] # Demo: empty profile
profile = UserMemoryProfile(user_id=user_id, facts=facts, last_loaded_at=time.time())
self._request_cache[user_id] = profile
return profile
async def save_fact(self, fact: MemoryFact) -> None:
"""Persist or update a single fact."""
profile = await self.load_profile(fact.user_id)
profile.upsert_fact(fact)
# In production: INSERT ... ON CONFLICT (user_id, key) DO UPDATE
print(f"Saved memory: [{fact.category.value}] {fact.key} = {fact.value} (confidence={fact.confidence:.2f})")
async def delete_fact(self, user_id: str, fact_id: str) -> None:
"""User explicitly removes a memory fact (GDPR right to erasure)."""
profile = await self.load_profile(user_id)
profile.facts = [f for f in profile.facts if f.fact_id != fact_id]
# In production: DELETE FROM memory_facts WHERE fact_id = $1 AND user_id = $2
async def clear_all(self, user_id: str) -> None:
"""Complete memory wipe. Required for GDPR/CCPA compliance."""
self._request_cache.pop(user_id, None)
# In production: DELETE FROM memory_facts WHERE user_id = $1
print(f"All memory cleared for user {user_id}")
Automatic Memory Extraction
Extract user facts from conversations without requiring explicit declaration. The extractor runs after each conversation turn, asynchronously, so it never adds latency to user-facing responses.
# memory/extractor.py
import anthropic
import json
import uuid
import time
import asyncio
import logging
from typing import Optional
logger = logging.getLogger(__name__)
EXTRACTION_SYSTEM_PROMPT = """You extract memorable user facts from conversations for an AI assistant's long-term memory.
Your job: identify facts that would meaningfully change how the assistant responds in future conversations.
HIGH VALUE facts (extract these):
- Technical environment: programming language, cloud provider, framework, OS, database
- Role and expertise: job title, experience level, company size/type, domain
- Communication preferences: wants code vs explanation, brief vs detailed, formal vs casual
- Constraints: compliance requirements, team standards, hardware limits
- Current project context: what they're building, timeline, goals
LOW VALUE facts (skip these):
- Things that change daily (what mood they're in today)
- Single-use context (specific error message they got once)
- Information already well-represented in existing facts
- Vague impressions without specific values
For each extracted fact, output JSON array with objects:
{
"category": "role|technical|preference|constraint|context",
"key": "snake_case_identifier",
"value": "specific value string",
"display_name": "Human readable label",
"confidence": 0.0-1.0,
"source": "explicit|inferred"
}
Rules:
- confidence = 1.0: user stated this directly ("I use Python")
- confidence = 0.7-0.9: strongly implied ("I've been using React for 5 years")
- confidence = 0.5-0.7: inferred from behavior (asked 3 Python questions)
- source "explicit" if user stated it, "inferred" if you detected it
Output ONLY a valid JSON array. Empty array [] if no facts found.
Do not re-extract facts already in the existing profile."""
class MemoryExtractor:
"""
Extracts memorable user facts from conversations.
Runs asynchronously - never blocks user-facing responses.
Architecture:
- Called after each assistant response is delivered
- Uses claude-haiku for speed and cost efficiency
- Deduplicates against existing profile before saving
- Only saves facts with confidence >= 0.5
"""
def __init__(
self,
memory_store: SemanticMemoryStore,
min_confidence: float = 0.5,
):
self.memory_store = memory_store
self.min_confidence = min_confidence
self._client = anthropic.Anthropic()
def extract_from_conversation(
self,
conversation: list[dict],
existing_profile: UserMemoryProfile,
) -> list[dict]:
"""
Synchronously extract facts from recent conversation turns.
Returns raw fact dicts before validation.
"""
# Only look at recent turns for efficiency
recent = conversation[-8:] # Last 4 exchanges
conv_text = "\n".join(
f"{msg['role'].upper()}: {str(msg['content'])[:400]}"
for msg in recent
)
# Summarize existing profile to avoid re-extraction
existing_summary = ", ".join(
f"{f.key}={f.value}" for f in existing_profile.facts[:30]
) or "None"
response = self._client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=600,
system=EXTRACTION_SYSTEM_PROMPT,
messages=[{
"role": "user",
"content": (
f"Conversation:\n{conv_text}\n\n"
f"Existing profile (don't re-extract): {existing_summary}"
),
}],
)
raw = response.content[0].text.strip()
# Handle model wrapping JSON in code blocks
if "```" in raw:
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
raw = raw.strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
logger.warning(f"Failed to parse extraction JSON: {raw[:100]}")
return []
async def process_async(
self,
user_id: str,
conversation: list[dict],
) -> list[MemoryFact]:
"""
Full async extraction pipeline:
1. Load existing profile
2. Extract new facts (sync, runs in executor)
3. Validate and deduplicate
4. Save to persistent store
Called via asyncio.create_task() - never blocks responses.
"""
try:
profile = await self.memory_store.load_profile(user_id)
# Run sync extraction in thread pool to avoid blocking event loop
loop = asyncio.get_event_loop()
raw_facts = await loop.run_in_executor(
None,
lambda: self.extract_from_conversation(conversation, profile),
)
saved: list[MemoryFact] = []
for raw in raw_facts:
if not isinstance(raw, dict):
continue
confidence = float(raw.get("confidence", 0.5))
if confidence < self.min_confidence:
continue
key = raw.get("key", "").strip()
value = str(raw.get("value", "")).strip()
if not key or not value:
continue
# Skip if same key with similar value already exists
existing = profile.get_fact(key)
if existing and existing.value.lower() == value.lower():
continue
try:
category = MemoryCategory(raw.get("category", "context"))
source = MemorySource(raw.get("source", "inferred"))
except ValueError:
category = MemoryCategory.CONTEXT
source = MemorySource.INFERRED
fact = MemoryFact(
fact_id=str(uuid.uuid4()),
user_id=user_id,
category=category,
key=key,
value=value,
display_name=raw.get("display_name", key.replace("_", " ").title()),
confidence=confidence,
source=source,
)
await self.memory_store.save_fact(fact)
saved.append(fact)
logger.info(f"Memory saved | user={user_id[:8]} | {key}={value} | confidence={confidence:.2f}")
return saved
except Exception as e:
logger.error(f"Memory extraction failed for {user_id}: {e}")
return []
Episodic Memory: Past Conversation Summaries
Episodic memory stores what was discussed in past sessions. It answers "we talked about X last week - use that context."
# memory/episodic_memory.py
import anthropic
import asyncio
import uuid
import time
import logging
from dataclasses import dataclass, field
from typing import Optional
logger = logging.getLogger(__name__)
@dataclass
class ConversationEpisode:
episode_id: str
user_id: str
conversation_id: str
summary: str # Compact summary of what was discussed
topics: list[str] # Key topics for filtering
decisions_made: list[str] # Decisions or conclusions reached
code_written: list[str] # Code snippets or references
embedding: Optional[list[float]] = None # For semantic search
created_at: float = field(default_factory=time.time)
message_count: int = 0
class EpisodicMemoryStore:
"""
Stores and retrieves summaries of past conversations.
Enables "remember when we discussed X" capability.
Storage: PostgreSQL with pgvector extension for semantic search.
Alternative: Pinecone, Weaviate, or Qdrant for vector operations.
Architecture:
- Write: after each session ends (async, doesn't block session)
- Read: at the start of each session, retrieve relevant past episodes
- Retention: 12+ months; summary storage is cheap
"""
def __init__(self, db=None, vector_store=None):
self.db = db
self.vector_store = vector_store
self._anthropic = anthropic.Anthropic()
def summarize_conversation(
self,
conversation: list[dict],
) -> dict:
"""
Create a compact, structured summary of a conversation.
Captures: topics discussed, decisions made, code written, key context.
"""
if len(conversation) < 4:
return {"summary": "", "topics": [], "decisions": [], "code_refs": []}
conv_text = "\n".join(
f"{msg['role'].upper()}: {str(msg['content'])[:500]}"
for msg in conversation
)
response = self._anthropic.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=400,
messages=[{
"role": "user",
"content": (
f"Summarize this conversation for long-term memory. "
f"Output JSON with fields:\n"
f"- summary: 3-4 sentences covering main topics and outcomes\n"
f"- topics: list of 3-8 topic strings (tech terms, concepts, tools discussed)\n"
f"- decisions: list of decisions or conclusions reached (empty if none)\n"
f"- code_refs: list of languages or tools they wrote code for (empty if none)\n\n"
f"Conversation:\n{conv_text}"
),
}],
)
import json
try:
raw = response.content[0].text
if "```" in raw:
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw.strip())
except Exception:
return {
"summary": response.content[0].text[:400],
"topics": [],
"decisions": [],
"code_refs": [],
}
async def save_episode(
self,
user_id: str,
conversation_id: str,
conversation: list[dict],
) -> Optional[ConversationEpisode]:
"""Summarize and persist a completed conversation."""
if len(conversation) < 4:
return None
summary_data = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.summarize_conversation(conversation),
)
if not summary_data.get("summary"):
return None
episode = ConversationEpisode(
episode_id=str(uuid.uuid4()),
user_id=user_id,
conversation_id=conversation_id,
summary=summary_data["summary"],
topics=summary_data.get("topics", []),
decisions_made=summary_data.get("decisions", []),
code_written=summary_data.get("code_refs", []),
message_count=len(conversation),
)
# Generate embedding for semantic retrieval
# In production: use a dedicated embedding model
# episode.embedding = await embed(episode.summary)
# Store in vector DB
# await self.vector_store.upsert({
# "id": episode.episode_id,
# "content": episode.summary + " " + " ".join(episode.topics),
# "embedding": episode.embedding,
# "metadata": {"user_id": user_id, "created_at": episode.created_at}
# })
logger.info(
f"Episode saved | user={user_id[:8]} | "
f"topics={episode.topics[:3]} | "
f"messages={episode.message_count}"
)
return episode
async def recall(
self,
user_id: str,
current_query: str,
max_results: int = 3,
min_similarity: float = 0.75,
) -> list[ConversationEpisode]:
"""
Retrieve relevant past conversation summaries.
Uses semantic search over episode embeddings.
Returns empty list if no relevant episodes found.
Only inject episodes with high similarity - low-relevance
episodes add noise without value.
"""
# In production:
# query_embedding = await embed(current_query)
# results = await vector_store.search(
# filter={"user_id": user_id},
# embedding=query_embedding,
# top_k=max_results * 2, # Over-fetch, then filter by similarity
# )
# return [r for r in results if r.similarity >= min_similarity][:max_results]
return [] # Demo: no episodes
The Memory Service: Orchestration Layer
# memory/memory_service.py
import anthropic
import asyncio
import logging
from typing import Optional
logger = logging.getLogger(__name__)
class MemoryService:
"""
Orchestrates all memory types for a complete personalized AI experience.
Request flow:
1. load_memory_context() - called before LLM request
Runs semantic memory load + episodic recall in parallel (~50-150ms)
2. The context is injected into the system prompt
3. post_response_update() - called after response is delivered
Runs extraction + episode save asynchronously (fire-and-forget)
"""
def __init__(
self,
semantic_store: SemanticMemoryStore,
episodic_store: EpisodicMemoryStore,
extractor: MemoryExtractor,
):
self.semantic = semantic_store
self.episodic = episodic_store
self.extractor = extractor
async def load_memory_context(
self,
user_id: str,
current_message: str,
max_context_tokens: int = 1500,
) -> str:
"""
Build memory context for injection into system prompt.
Parallel: load semantic profile + recall relevant episodes.
Token budget: semantic memory ~500 tokens, episodic ~1000 tokens.
Trim if over budget.
"""
# Parallel fetch - semantic and episodic are independent
profile, episodes = await asyncio.gather(
self.semantic.load_profile(user_id),
self.episodic.recall(
user_id=user_id,
current_query=current_message,
max_results=2,
),
)
sections = []
# Semantic memory (always include if any facts exist)
semantic_ctx = profile.to_context_string()
if semantic_ctx:
sections.append(semantic_ctx)
# Episodic memory (only include if relevant episodes found)
if episodes:
episode_ctx = "\n## Relevant Past Conversations\n"
for ep in episodes:
age_days = int((time.time() - ep.created_at) / 86400)
age_str = f"{age_days}d ago" if age_days > 0 else "today"
episode_ctx += f"- [{age_str}] {ep.summary}\n"
if ep.decisions_made:
episode_ctx += f" Decisions: {'; '.join(ep.decisions_made[:2])}\n"
sections.append(episode_ctx)
return "\n\n".join(sections)
async def build_personalized_system_prompt(
self,
user_id: str,
current_message: str,
base_system_prompt: str,
) -> str:
"""
Build a personalized system prompt with memory context injected.
"""
memory_ctx = await self.load_memory_context(user_id, current_message)
if not memory_ctx:
return base_system_prompt
return base_system_prompt + "\n\n" + memory_ctx
def schedule_post_response_update(
self,
user_id: str,
conversation_id: str,
conversation: list[dict],
) -> None:
"""
Fire-and-forget memory update after response is delivered.
Never blocks the user-facing response.
Runs: fact extraction + episode save in background.
"""
async def _update():
try:
await asyncio.gather(
self.extractor.process_async(user_id, conversation),
self.episodic.save_episode(user_id, conversation_id, conversation)
if len(conversation) >= 6 else asyncio.sleep(0),
)
except Exception as e:
logger.error(f"Post-response memory update failed: {e}")
asyncio.create_task(_update())
# FastAPI integration
async def personalized_chat(
user_id: str,
conversation_id: str,
user_message: str,
conversation_history: list[dict],
memory_service: MemoryService,
base_system: str = "You are a helpful AI assistant.",
) -> str:
"""Complete personalized chat call with memory."""
client = anthropic.AsyncAnthropic()
# Build personalized system prompt (adds ~50-150ms)
system = await memory_service.build_personalized_system_prompt(
user_id=user_id,
current_message=user_message,
base_system_prompt=base_system,
)
messages = conversation_history + [{"role": "user", "content": user_message}]
response = await client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
system=system,
messages=messages,
)
response_text = response.content[0].text
# Update memory after response (async - doesn't add latency)
full_conversation = messages + [{"role": "assistant", "content": response_text}]
memory_service.schedule_post_response_update(
user_id=user_id,
conversation_id=conversation_id,
conversation=full_conversation,
)
return response_text
User-Controlled Memory: The Privacy Layer
Users must be able to see, edit, and delete their AI's memory. This is both ethically required and legally mandated under GDPR and CCPA.
# api/memory_api.py
from fastapi import FastAPI, HTTPException, Depends
from typing import Optional
from pydantic import BaseModel
app = FastAPI()
class FactUpdateRequest(BaseModel):
new_value: str
display_name: Optional[str] = None
@app.get("/api/memory/profile")
async def get_memory_profile(
user_id: str = Depends(get_current_user_id),
memory_store: SemanticMemoryStore = Depends(get_memory_store),
) -> dict:
"""
Show user their complete memory profile.
Privacy principle: users can always see exactly what the AI "knows" about them.
"""
profile = await memory_store.load_profile(user_id)
return {
"user_id": user_id,
"fact_count": len(profile.facts),
"facts": [
{
"fact_id": f.fact_id,
"category": f.category.value,
"key": f.key,
"display_name": f.display_name,
"value": f.value,
"source": f.source.value,
"confidence": round(f.confidence, 2),
"last_updated": f.updated_at,
}
for f in sorted(profile.facts, key=lambda x: x.category.value)
],
}
@app.put("/api/memory/facts/{fact_id}")
async def update_fact(
fact_id: str,
body: FactUpdateRequest,
user_id: str = Depends(get_current_user_id),
memory_store: SemanticMemoryStore = Depends(get_memory_store),
) -> dict:
"""
User corrects a fact the AI got wrong.
User corrections are marked as 'corrected' source and confidence=1.0.
"""
profile = await memory_store.load_profile(user_id)
fact = next((f for f in profile.facts if f.fact_id == fact_id), None)
if not fact or fact.user_id != user_id:
raise HTTPException(status_code=404, detail="Fact not found")
fact.value = body.new_value
if body.display_name:
fact.display_name = body.display_name
fact.source = MemorySource.CORRECTED
fact.confidence = 1.0
fact.updated_at = time.time()
await memory_store.save_fact(fact)
return {"updated": fact_id, "new_value": body.new_value}
@app.delete("/api/memory/facts/{fact_id}")
async def delete_fact(
fact_id: str,
user_id: str = Depends(get_current_user_id),
memory_store: SemanticMemoryStore = Depends(get_memory_store),
) -> dict:
"""
User removes a specific memory fact.
GDPR: right to deletion of personal data.
"""
await memory_store.delete_fact(user_id, fact_id)
return {"deleted": fact_id}
@app.delete("/api/memory/all")
async def clear_all_memory(
user_id: str = Depends(get_current_user_id),
memory_store: SemanticMemoryStore = Depends(get_memory_store),
episodic_store: EpisodicMemoryStore = Depends(get_episodic_store),
) -> dict:
"""
Complete memory wipe - all facts and episode summaries deleted.
Required for GDPR right to erasure (Article 17).
"""
await asyncio.gather(
memory_store.clear_all(user_id),
# episodic_store.delete_all_episodes(user_id),
)
return {
"cleared": True,
"message": "All AI memory has been permanently deleted.",
}
@app.post("/api/memory/toggle")
async def toggle_memory(
enabled: bool,
user_id: str = Depends(get_current_user_id),
) -> dict:
"""
User can disable memory collection entirely.
When disabled: AI doesn't extract or use personal facts.
"""
# Store preference
# await db.execute("UPDATE user_settings SET memory_enabled = $1 WHERE user_id = $2", enabled, user_id)
return {
"memory_enabled": enabled,
"message": (
"Memory enabled - the AI will remember your preferences across conversations."
if enabled else
"Memory disabled - the AI will start fresh in each conversation."
),
}
def get_current_user_id() -> str:
return "user_123" # Production: extract from JWT
def get_memory_store() -> SemanticMemoryStore:
return SemanticMemoryStore()
def get_episodic_store() -> EpisodicMemoryStore:
return EpisodicMemoryStore()
Memory UI: The Transparency Panel
// components/MemoryPanel.tsx
import React, { useState, useEffect } from "react";
interface MemoryFact {
fact_id: string;
category: string;
display_name: string;
value: string;
source: "explicit" | "inferred" | "corrected";
confidence: number;
}
const CATEGORY_ICONS: Record<string, string> = {
role: "👤",
technical: "💻",
preference: "⚙️",
constraint: "🔒",
context: "📋",
};
function SourceBadge({ source, confidence }: { source: string; confidence: number }) {
if (source === "explicit" || source === "corrected") {
return (
<span className="text-xs bg-green-100 text-green-700 px-1.5 py-0.5 rounded">
You told me
</span>
);
}
return (
<span className="text-xs bg-blue-50 text-blue-600 px-1.5 py-0.5 rounded">
Inferred ({Math.round(confidence * 100)}%)
</span>
);
}
export function MemoryPanel({ userId }: { userId: string }) {
const [facts, setFacts] = useState<MemoryFact[]>([]);
const [loading, setLoading] = useState(true);
const [editingId, setEditingId] = useState<string | null>(null);
const [editValue, setEditValue] = useState("");
useEffect(() => {
fetch("/api/memory/profile")
.then((r) => r.json())
.then((data) => {
setFacts(data.facts);
setLoading(false);
});
}, []);
const deleteFact = async (factId: string) => {
await fetch(`/api/memory/facts/${factId}`, { method: "DELETE" });
setFacts((prev) => prev.filter((f) => f.fact_id !== factId));
};
const saveEdit = async (factId: string) => {
const res = await fetch(`/api/memory/facts/${factId}`, {
method: "PUT",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ new_value: editValue }),
});
if (res.ok) {
setFacts((prev) =>
prev.map((f) =>
f.fact_id === factId
? { ...f, value: editValue, source: "corrected" as const }
: f
)
);
setEditingId(null);
}
};
const clearAll = async () => {
if (!confirm("Delete all AI memory? This cannot be undone.")) return;
await fetch("/api/memory/all", { method: "DELETE" });
setFacts([]);
};
const grouped = facts.reduce<Record<string, MemoryFact[]>>((acc, f) => {
acc[f.category] = acc[f.category] ?? [];
acc[f.category].push(f);
return acc;
}, {});
if (loading) return <div className="p-4 text-gray-500 text-sm">Loading memory...</div>;
return (
<div className="bg-white rounded-2xl border border-gray-200 p-4 max-w-md">
<div className="flex items-center justify-between mb-4">
<h3 className="font-semibold text-gray-800">What I Remember About You</h3>
{facts.length > 0 && (
<button
onClick={clearAll}
className="text-xs text-red-500 hover:text-red-700"
>
Clear all
</button>
)}
</div>
{facts.length === 0 ? (
<div className="text-center py-8 text-gray-400">
<div className="text-3xl mb-2">🧠</div>
<div className="text-sm">No memories yet</div>
<div className="text-xs mt-1">
I'll remember preferences and context as we talk
</div>
</div>
) : (
<div className="space-y-4">
{Object.entries(grouped).map(([category, categoryFacts]) => (
<div key={category}>
<div className="flex items-center gap-1.5 mb-2">
<span>{CATEGORY_ICONS[category] || "•"}</span>
<span className="text-xs font-medium text-gray-500 uppercase tracking-wide">
{category}
</span>
</div>
<div className="space-y-1.5">
{categoryFacts.map((fact) => (
<div
key={fact.fact_id}
className="flex items-center justify-between bg-gray-50 rounded-lg px-3 py-2 group"
>
<div className="flex-1 min-w-0">
<div className="text-xs text-gray-500 mb-0.5">{fact.display_name}</div>
{editingId === fact.fact_id ? (
<div className="flex gap-2">
<input
value={editValue}
onChange={(e) => setEditValue(e.target.value)}
className="flex-1 text-sm border border-blue-300 rounded px-2 py-0.5 focus:outline-none"
autoFocus
/>
<button
onClick={() => saveEdit(fact.fact_id)}
className="text-xs text-blue-600 font-medium"
>
Save
</button>
<button
onClick={() => setEditingId(null)}
className="text-xs text-gray-400"
>
Cancel
</button>
</div>
) : (
<div className="flex items-center gap-2">
<span className="text-sm font-medium text-gray-800 truncate">
{fact.value}
</span>
<SourceBadge source={fact.source} confidence={fact.confidence} />
</div>
)}
</div>
{editingId !== fact.fact_id && (
<div className="flex gap-1 ml-2 opacity-0 group-hover:opacity-100 transition-opacity">
<button
onClick={() => {
setEditingId(fact.fact_id);
setEditValue(fact.value);
}}
className="text-xs text-gray-400 hover:text-blue-600 px-1"
title="Edit"
>
✏️
</button>
<button
onClick={() => deleteFact(fact.fact_id)}
className="text-xs text-gray-400 hover:text-red-600 px-1"
title="Delete"
>
✕
</button>
</div>
)}
</div>
))}
</div>
</div>
))}
</div>
)}
<div className="mt-4 pt-4 border-t border-gray-100 text-xs text-gray-400">
I use this context to personalize responses. You can edit or delete any item.
</div>
</div>
);
}
Production Engineering Notes
Memory extraction is always async - never block the response. Run extraction as a background task after the response is delivered. Never add memory extraction latency to the user-facing response time. Use asyncio.create_task() in Python or a job queue like Celery for reliability.
Memory freshness and TTLs. Volatile facts (company, role, current project) should expire after 90-180 days and be re-confirmed. Stable facts (programming language preference, communication style) can be permanent. Mark inferred facts with TTLs shorter than explicit facts - they're less certain to begin with.
Disclosure first. Before activating memory for a user, show a clear modal: "I can remember details from our conversations to personalize future responses. [What I'll remember] [Turn off] [OK]." Never silently activate memory. Users who consent to memory and understand it are more forgiving of extraction errors and more likely to use the "edit memory" feature when the AI gets something wrong.
:::tip Start with Semantic Memory Only Episodic memory (conversation summaries + vector search) adds significant complexity: a vector database, embedding infrastructure, and retrieval logic. Start with semantic memory (user preferences as structured key-value facts). You get 80% of the value with 20% of the implementation effort. Add episodic memory only when users explicitly ask for "do you remember when we..." capability. :::
:::warning Memory Can Encode Stale or Wrong Assumptions If a user says "I hate verbose responses" in session 1, the AI might apply this aggressively in session 100 when they're asking a complex question that genuinely warrants a detailed answer. Two mitigations: (1) Always let in-session context override stored memory - if a user asks for detail, give it regardless of the preference. (2) Prompt the AI to acknowledge stored preferences explicitly when relevant: "I'll keep this concise since you prefer brief responses - let me know if you want more detail." :::
Interview Q&A
Q1: Describe the different types of AI memory and when to use each.
Four types: (1) In-context (token window) - ephemeral, lost on session end, used for current conversation. No special infrastructure needed. (2) Semantic memory - structured facts about users (preferences, role, tech stack), stored in relational DB, loaded at session start as system prompt additions. Highest value, simplest to implement. (3) Episodic memory - summaries of past conversations, stored in vector DB, retrieved via semantic similarity when relevant. Enables "remember last week's discussion." Adds complexity - only worth building when users need cross-session recall. (4) Procedural memory - behavioral patterns (prefers code-first, wants minimal intro). Stored as semantic facts, applied on every request. Start with semantic only; add episodic when evidence shows users need it.
Q2: How do you extract user facts from conversations without requiring explicit input?
Run an LLM (Haiku for speed and cost) after each conversation turn, asynchronously. Send the last 4-6 exchanges plus the existing memory profile summary. Ask the model to identify new memorable facts in JSON format. Validate: require confidence >= 0.5, skip facts with no specific value, skip if the same key with the same value already exists. Mark explicit statements as confidence 1.0, inferred facts as 0.5-0.9. Save to persistent store. Total cost per extraction call: $0.0001-0.0003 (Haiku). Never run this synchronously - fire-and-forget after response delivery.
Q3: How do you give users control over their AI's memory?
Three required controls: (1) View - show all stored facts with display names, source (you told me vs inferred), and confidence score. No raw keys or internal fields. (2) Edit - let users correct facts the AI got wrong. Mark corrected facts as source="corrected" with confidence=1.0. (3) Delete - individual fact deletion and full memory wipe (GDPR right to erasure). Also: a memory toggle (on/off per user), visible in the main settings. Show memory usage count - "Used in 47 conversations" - so users understand the value of memory before considering deletion. Never hide that memory is active.
Q4: How do you handle stale or conflicting memories?
Use TTLs for volatile facts: role/company expire after 120 days, current project after 30 days, tech preferences are permanent. When the user says something that contradicts stored memory in the current session, always prioritize the current context - "I'm actually using GCP now" overrides any stored "cloud_provider=AWS." For graceful conflict handling, have the AI acknowledge the change: "I see you've switched from AWS to GCP - I'll remember that going forward." Let users easily correct stored facts from the memory management UI.
Q5: What are the privacy and compliance requirements for AI memory?
Under GDPR (EU) and CCPA (California): (1) Informed consent - disclose memory collection before activating it, in plain language. (2) Right to access - users can see all stored facts (your /api/memory/profile endpoint). (3) Right to deletion - individual fact deletion AND full memory wipe, executable immediately (your /api/memory/all endpoint). (4) Data minimization - only store facts that meaningfully improve the AI experience, not everything extractable. (5) Purpose limitation - memory data should only be used to personalize the AI, never for advertising or third-party sharing. Also: audit log all memory access and modifications, encrypt memory data at rest, and have a data retention policy with automatic expiry for old facts.
Q6: How do you prevent memory from degrading response quality?
Three mitigations: (1) In-session context always wins over stored memory. Never let memory override what the user just told you. (2) Cap memory injection token budget - 500-1000 tokens maximum. Long memory blocks crowd out the actual conversation context and reduce response quality. (3) Confidence thresholding - only inject facts with confidence >= 0.5 into the system prompt. Low-confidence inferences add noise. Additionally: monitor response quality metrics segmented by memory-active vs. memory-disabled users. If memory-active users have lower satisfaction rates, the memory injection is hurting rather than helping.
