Module 05: ML Architecture Patterns
Every ML system that survives contact with production traffic shares one thing: it was designed around a set of architectural patterns that the team understood deeply before writing a single line of model code. This module dissects those patterns.
What You Will Learn
Lesson Map
| # | Lesson | Core Pattern | Key Takeaway |
|---|---|---|---|
| 01 | Lambda and Kappa Architecture | Batch + Stream | Lambda = correctness + speed; Kappa = simplicity |
| 02 | Microservices for ML | Service decomposition | Split when teams and scale demand it |
| 03 | Event Sourcing for ML | Immutable event log | Complete auditability and time-travel queries |
| 04 | ML Platform Design | Shared infrastructure | Build developer experience, not just compute |
| 05 | Reproducibility and Auditability | Code + Data + Env | Every prediction must be replayable |
| 06 | Multi-Tenant ML Platforms | Isolation + Fair scheduling | Serve many teams safely on shared GPU fleets |
Key Concepts Covered
- Lambda Architecture - Nathan Marz's 2011 batch-plus-speed model and where it still wins
- Kappa Architecture - Jay Kreps's 2014 stream-only simplification and its ML implications
- ML Microservices - Feature service, model service, monitoring service, and the distributed-systems tax
- Event Sourcing - Kafka as an event store; temporal queries; the projection pattern
- ML Platform - What Uber Michelangelo, LinkedIn Pro-ML, and Airbnb Bighead got right
- Reproducibility Stack - Git + DVC + MLflow + Docker + seed management
- Multi-Tenancy - Kubernetes namespaces, fair scheduling, data isolation, cost attribution
Prerequisites
Before starting this module, you should be comfortable with:
- Module 03: Feature Engineering and Pipelines
- Module 04: Model Serving and Infrastructure
- Basic Kubernetes concepts (pods, deployments, namespaces)
- Apache Kafka basics (topics, partitions, consumer groups)
Why Architecture Patterns Matter
A model is not a product. A model plugged into a well-designed architecture is a product. The patterns in this module are the difference between a notebook experiment that works once and a system that serves millions of predictions per day, degrades gracefully, recovers from failures, passes compliance audits, and lets multiple teams iterate independently without stepping on each other.
Every pattern here was invented because someone hit a wall in production. Learn the pattern, understand the wall they hit, and you will not need to hit it yourself.
