Advanced RAG Patterns
Go beyond naive RAG - master query transformation, HyDE, multi-query retrieval, Self-RAG, Corrective RAG, and iterative retrieval patterns for complex questions.
Go beyond naive RAG - master query transformation, HyDE, multi-query retrieval, Self-RAG, Corrective RAG, and iterative retrieval patterns for complex questions.
Build agents that control their own retrieval - multi-step reasoning, router agents, ReAct loops, LangGraph stateful pipelines, and production patterns for agentic retrieval systems.
Master the art and science of splitting documents into chunks that maximize retrieval precision - the most underestimated decision in RAG system design.
Master embedding model selection for retrieval - MTEB benchmarks, model families, Matryoshka embeddings, bi-encoders vs cross-encoders, and fine-tuning strategies.
Master Microsoft's Graph RAG - build knowledge graphs from documents, use community detection for global queries, and understand when graph structure beats flat vector search.
Combine BM25 sparse retrieval with dense vector search for best-of-both-worlds performance - understand SPLADE, fusion methods, and when hybrid beats pure dense.
Python patterns for building production LLM applications - API integration, streaming, prompt engineering, token management, tool use, and vector search.
Master Retrieval-Augmented Generation - the dominant pattern for grounding LLMs in external knowledge at production scale.
Embeddings, vector databases, similarity search, RAG pipelines, and production vector search in Python with FAISS, Chroma, Pinecone, and pgvector.
Build rigorous RAG evaluation with RAGAS, TruLens, LLM-as-judge, golden datasets, and production monitoring - measure faithfulness, relevance, and groundedness.
Master the two-stage retrieval-reranking architecture - cross-encoders, ColBERT, LLM-as-reranker, Reciprocal Rank Fusion, and production latency budgets.
Read the 8 most important RAG papers in the right order. From the original Lewis et al. through GraphRAG. Full engineering context between each paper.
Master the approximate nearest neighbor algorithms powering vector search - HNSW, IVF, IVF-PQ, ScaNN, and DiskANN with parameter tuning and recall-latency trade-offs.
Attack surfaces unique to RAG architectures - document poisoning, retrieval hijacking, indirect prompt injection, embedding collision, cross-tenant leakage, and defense-in-depth strategies for production RAG deployments.
Compare Pinecone, Qdrant, Weaviate, Milvus, Chroma, and pgvector - understand the engineering trade-offs and build a production vector store.
Understand why LLMs hallucinate, what RAG actually solves, and the decision framework for choosing RAG vs fine-tuning vs prompt stuffing.