Module 11: RAG Engineering

Retrieval-Augmented Generation (RAG) is the most widely deployed pattern in production AI systems today. It solves a fundamental problem: language models know everything they learned during training, but nothing about your documents, your customers, or anything that happened after their knowledge cutoff. RAG gives the model real-time access to your data - reliably, cheaply, and without retraining.

This module covers the full RAG engineering stack: from the theory of why RAG exists, through every layer of the pipeline, to production evaluation and agentic extensions.

What You'll Learn

Lessons

#	Lesson	Key Concepts
01	Why RAG Exists	LLM knowledge limits, hallucination, RAG vs. fine-tuning vs. prompt stuffing, founding paper
02	Document Ingestion and Chunking	Fixed-size, semantic, structure-aware chunking; PDF/DOCX parsing; metadata enrichment
03	Embedding Models in Production	Dense embeddings, model selection, fine-tuning embeddings, embedding pipelines, dimensionality
04	Vector Search in Practice	FAISS, pgvector, Pinecone, Weaviate; ANN algorithms; index tuning; exact vs. approximate search
05	Hybrid Search and Reranking	BM25 sparse retrieval, reciprocal rank fusion, cross-encoder reranking, Cohere Rerank
06	Query Transformation and HyDE	HyDE, multi-query expansion, step-back prompting, query decomposition for complex questions
07	Agentic RAG	Self-correcting retrieval loops, CRAG, multi-hop reasoning, tool-augmented retrieval
08	RAG Evaluation with RAGAS	Faithfulness, answer relevancy, context precision/recall; building golden datasets; CI eval

Prerequisites

Familiarity with LLM APIs and the Anthropic SDK
Python async programming basics
Basic understanding of embeddings and vector similarity (helpful but not required)

The Core Challenge of Production RAG

A RAG system that scores 85% in offline evaluation routinely fails users in ways your test set never predicted. The gap between benchmark performance and production quality is wider in RAG than in almost any other AI pattern - because RAG has more failure modes, stacked on top of each other.

The retrieval can fail (wrong chunks returned). The chunks can be malformed (tables split mid-row, code split mid-function). The context can be too long or too short. The generator can ignore the retrieved context and hallucinate anyway. And unlike a classification model where you can measure accuracy on a holdout set, RAG failure is often invisible - the model confidently synthesizes an answer from partially-correct chunks that contains subtle factual errors.

This module teaches you to engineer every layer of the stack so failures are visible, measurable, and systematically reduced.

The RAG Stack at a Glance

Common RAG Failure Modes

Layer	Failure	Symptom
Parsing	Table extracted as garbled text	Answers about tabular data are wrong
Chunking	Sentence split mid-entity	Retrieval misses the relevant chunk
Embedding	Wrong model for domain	Similar documents ranked far apart
Retrieval	Low precision top-k	Irrelevant context passed to generator
Reranking	Skipped to save latency	Precision drops 20-30%
Generation	Model ignores context	Hallucinations despite correct retrieval
Evaluation	Only measures offline accuracy	Production failures are invisible

Each layer of this module addresses one row of this table.

What You'll Learn​

Lessons​

Prerequisites​

The Core Challenge of Production RAG​

The RAG Stack at a Glance​

Common RAG Failure Modes​