Module 7 - Vector Database Engineering

Modern AI applications are defined by their ability to retrieve meaning, not just keywords. Vector databases are the infrastructure layer that makes semantic search, RAG pipelines, recommendation systems, and multimodal retrieval possible at scale.

This module teaches you how vector search actually works - from similarity metrics to approximate nearest neighbor algorithms - and how to operate vector databases reliably in production.

What You Will Learn

Lessons in This Module

#	Lesson	Key Concepts
01	Vector Similarity Search	Cosine, dot product, L2, recall@K, exact vs approximate
02	ANN Algorithms	HNSW, IVF, PQ, IVFPQ, LSH, DiskANN
03	Vector Databases Compared	Pinecone, Weaviate, Qdrant, Chroma, pgvector
04	Embedding Pipelines	Model selection, batching, re-indexing, drift
05	Hybrid Search	BM25 + dense, SPLADE, Reciprocal Rank Fusion
06	Filtering and Metadata	Pre/post filter, ACORN, multi-tenancy, sharding
07	Scalability and Sharding	Horizontal scale, hot-cold tiering, distributed HNSW
08	Production Vector DB	Monitoring, capacity planning, disaster recovery

Key Mental Models

Recall vs Latency is the central tradeoff. Exact search guarantees 100% recall but scales as $O(n)$ . Approximate search trades a few percent of recall for orders-of-magnitude speedup. The right operating point depends on your application.

Embeddings are not static. When you upgrade your embedding model, every vector in your index becomes stale. Re-indexing 50M documents takes time, compute, and a zero-downtime migration strategy.

Filtering changes everything. Adding a metadata filter to a vector query sounds trivial but can destroy index efficiency. Pre-filtering, post-filtering, and hybrid approaches (ACORN) each have different performance characteristics.

The database is not the hard part. Choosing between Pinecone and Qdrant is less important than getting your embedding pipeline right, your recall evaluation correct, and your metadata schema designed for the queries you'll actually run.

Prerequisites

Familiarity with embeddings and similarity search concepts (Module 05)
Basic Python and NumPy
Understanding of database indexing fundamentals

What You Will Learn​

Lessons in This Module​

Key Mental Models​

Prerequisites​

What You Will Learn

Lessons in This Module

Key Mental Models

Prerequisites