Module 6: Algorithms for ML Engineers

Standard algorithm courses teach sorting, searching, and graph traversal in abstract. ML engineers need the same concepts applied to specific problems: building embedding lookup tables for 100 billion user-item pairs, performing k-nearest-neighbor search in 1536-dimensional space at 10ms latency, sampling from a multinomial distribution efficiently during autoregressive decoding, and understanding why transformer attention is O(n^2) in sequence length.

This module covers algorithms through the lens of problems you will actually encounter.

Why Algorithms Matter in ML

Embedding tables at scale. A recommendation system with 100 billion user-item pairs needs O(1) average lookup. A naive array is too slow. A Python dict has too much overhead. Understanding hash maps lets you choose between numpy arrays, custom C hash tables, and approximate structures.

Vector search. Nearest neighbor search in high-dimensional spaces is a fundamental ML operation - powering RAG retrieval, recommendation candidates, and duplicate detection. Brute force is O(n×d) per query. HNSW gives you approximate O(log n) search. FAISS's IVF gives you configurable accuracy-speed tradeoffs. Choosing correctly requires understanding the algorithms.

Attention complexity. Standard transformer attention is O(n^2) in sequence length. This is why 4K-context models were the norm for years. FlashAttention does not change the complexity class but reduces the constant factor by 5-10x through better memory access. Linear attention reduces the complexity class to O(n) but trades approximation quality. Understanding the algorithmic tradeoff is required to reason about long-context model design.

Algorithm Complexity in ML Operations

Lessons in This Module

#	Lesson	Key Concept
1	Time and Space Complexity for ML	Big-O in practice, when constant factors dominate
2	Hash Maps and Embeddings	Open addressing, load factors, vocabulary lookup
3	Trees and Approximate Nearest Neighbors	KD-tree, ball tree, HNSW, IVF, FAISS
4	Sorting and Sampling at Scale	Reservoir sampling, weighted sampling, top-k
5	Graph Algorithms for GNNs	BFS/DFS on graphs, subgraph sampling, message passing
6	Randomized Algorithms in ML	MinHash, LSH, random projections, sketch algorithms
7	Dynamic Programming in ML	Viterbi, CTC decoding, sequence alignment
8	Algorithmic Complexity of Attention	O(n^2) analysis, FlashAttention, linear attention

Key Concepts You Will Master

HNSW graph structure - the hierarchical navigable small world graph that powers production vector search
Reservoir sampling - sampling k items from a stream without knowing the stream size
Top-k with partial sort - O(n log k) instead of O(n log n) for beam search
LSH (Locality Sensitive Hashing) - the family of algorithms that enable approximate similarity search
CTC decoding - the dynamic programming algorithm behind speech recognition output alignment

Prerequisites

Basic algorithm knowledge (O notation, sorting, searching)
Python proficiency

Why Algorithms Matter in ML​

Algorithm Complexity in ML Operations​

Lessons in This Module​

Key Concepts You Will Master​

Prerequisites​

Why Algorithms Matter in ML

Algorithm Complexity in ML Operations

Lessons in This Module

Key Concepts You Will Master

Prerequisites