Module 6: Algorithms for ML Engineers
Standard algorithm courses teach sorting, searching, and graph traversal in abstract. ML engineers need the same concepts applied to specific problems: building embedding lookup tables for 100 billion user-item pairs, performing k-nearest-neighbor search in 1536-dimensional space at 10ms latency, sampling from a multinomial distribution efficiently during autoregressive decoding, and understanding why transformer attention is O(n^2) in sequence length.
This module covers algorithms through the lens of problems you will actually encounter.
Why Algorithms Matter in ML
Embedding tables at scale. A recommendation system with 100 billion user-item pairs needs O(1) average lookup. A naive array is too slow. A Python dict has too much overhead. Understanding hash maps lets you choose between numpy arrays, custom C hash tables, and approximate structures.
Vector search. Nearest neighbor search in high-dimensional spaces is a fundamental ML operation - powering RAG retrieval, recommendation candidates, and duplicate detection. Brute force is O(n×d) per query. HNSW gives you approximate O(log n) search. FAISS's IVF gives you configurable accuracy-speed tradeoffs. Choosing correctly requires understanding the algorithms.
Attention complexity. Standard transformer attention is O(n^2) in sequence length. This is why 4K-context models were the norm for years. FlashAttention does not change the complexity class but reduces the constant factor by 5-10x through better memory access. Linear attention reduces the complexity class to O(n) but trades approximation quality. Understanding the algorithmic tradeoff is required to reason about long-context model design.
Algorithm Complexity in ML Operations
Lessons in This Module
| # | Lesson | Key Concept |
|---|---|---|
| 1 | Time and Space Complexity for ML | Big-O in practice, when constant factors dominate |
| 2 | Hash Maps and Embeddings | Open addressing, load factors, vocabulary lookup |
| 3 | Trees and Approximate Nearest Neighbors | KD-tree, ball tree, HNSW, IVF, FAISS |
| 4 | Sorting and Sampling at Scale | Reservoir sampling, weighted sampling, top-k |
| 5 | Graph Algorithms for GNNs | BFS/DFS on graphs, subgraph sampling, message passing |
| 6 | Randomized Algorithms in ML | MinHash, LSH, random projections, sketch algorithms |
| 7 | Dynamic Programming in ML | Viterbi, CTC decoding, sequence alignment |
| 8 | Algorithmic Complexity of Attention | O(n^2) analysis, FlashAttention, linear attention |
Key Concepts You Will Master
- HNSW graph structure - the hierarchical navigable small world graph that powers production vector search
- Reservoir sampling - sampling k items from a stream without knowing the stream size
- Top-k with partial sort - O(n log k) instead of O(n log n) for beam search
- LSH (Locality Sensitive Hashing) - the family of algorithms that enable approximate similarity search
- CTC decoding - the dynamic programming algorithm behind speech recognition output alignment
Prerequisites
- Basic algorithm knowledge (O notation, sorting, searching)
- Python proficiency
