Module 8 - Kubernetes for ML
Kubernetes has become the operating system of the cloud-native era. For ML engineers, it is no longer optional - every major ML platform (Kubeflow, Vertex AI, SageMaker, Azure ML) runs on top of it, and engineering teams expect you to be fluent in it. This module takes you from zero to production-ready, covering every aspect of running ML on Kubernetes.
What You Will Learn
Module Lessons
| # | Lesson | Key Skills |
|---|---|---|
| 01 | K8s Fundamentals for ML | Pods, Deployments, ConfigMaps, Secrets, PVCs, resource requests, liveness/readiness probes |
| 02 | Pods, Deployments & Services | Core workload patterns, service types, namespaces, networking |
| 03 | GPU Scheduling | NVIDIA device plugin, MIG, time-slicing, node affinity, GPU quotas |
| 04 | Helm for ML | Chart anatomy, parameterized deployments, lifecycle hooks, umbrella charts |
| 05 | Training Jobs on K8s | Job/CronJob, PyTorchJob, TFJob, fault tolerance, spot node handling |
| 06 | Autoscaling ML Workloads | HPA, KEDA, rolling updates, readiness gates for model warmup |
| 07 | KServe Model Serving | Operators, CRDs, KServe, Seldon Core, Argo Workflows |
Why Kubernetes for ML?
The ML lifecycle has three distinct computational phases: experimentation (exploratory, bursty, GPU-heavy), training (long-running, fault-sensitive, distributed), and serving (latency-critical, variable load, cost-sensitive). No single compute paradigm handles all three well.
Kubernetes handles all three through a unified API:
- GPU scheduling - declarative resource requests (
nvidia.com/gpu: 1) instead of manual allocation - Distributed training - Training Operator's PyTorchJob manages
torchrunacross pods - Elastic serving - KEDA scales deployments on custom metrics including GPU utilization
- Reproducibility - Docker containers + declarative manifests = identical environments everywhere
- Cost control - ResourceQuotas per namespace prevent runaway GPU spend
Prerequisites
Before starting this module, you should be comfortable with:
- Docker: building images, understanding layers, volumes, networking
- Basic Linux: file permissions, processes, environment variables
- Python ML: you have trained at least one model with PyTorch or TensorFlow
- Module 07 (CI/CD for ML) - particularly how models are packaged as container images
Key Mental Models
Kubernetes is a desired-state engine. You declare what you want (3 replicas of my model server, each with 1 GPU and 8 GB RAM), and Kubernetes continuously reconciles reality toward that state. If a pod crashes, Kubernetes recreates it. If a node goes down, pods are rescheduled. This is fundamentally different from imperative "run this command on that server" thinking.
Everything is a resource. Pods, Deployments, Services, ConfigMaps, Jobs, CronJobs - all are Kubernetes resources with a spec and a status. Custom resources (CRDs) extend this model: a PyTorchJob is just a Kubernetes resource that the Training Operator knows how to reconcile.
Namespaces are your blast radius control. A runaway training job in team-a namespace cannot OOM pods in team-b namespace if ResourceQuotas are correctly configured. Namespaces are the primary isolation boundary in shared ML clusters.
