Module 06: Containerization

Why Containers Changed ML Engineering

"Works on my machine" is the original sin of software development. In ML, the problem is worse: it is not just the code that needs to match, it is the CUDA version, cuDNN version, Python version, library versions, system libraries, and sometimes even hardware architecture. A data scientist on a MacBook M2, a training cluster running Ubuntu 22.04 with CUDA 12.1, and a production inference server running RHEL 8 with CUDA 11.8 are three completely different environments. Without containers, shipping a model from development to production is an exercise in environment archaeology.

This module covers containers from first principles for ML engineers: why they matter, how to write efficient Dockerfiles, how to build GPU-enabled containers, how to manage images in CI/CD, and how to build a complete local ML development environment with Docker Compose that onboards a new team member in 10 minutes instead of 3 days.

Module Map

Learning Objectives

By the end of this module you will be able to:

Write efficient ML Dockerfiles with proper layer ordering and caching strategy
Reduce ML Docker image sizes from gigabytes to hundreds of megabytes using multi-stage builds
Build and run GPU-enabled containers with proper NVIDIA runtime configuration
Set up a container registry workflow with security scanning and environment promotion
Configure Istio service mesh traffic splitting for safe ML model canary deployments
Build a complete local ML development environment with Docker Compose

Prerequisites

Basic Linux command line
Python package management (pip, conda)
Module 05 (CI/CD for ML) recommended - containers are used in those pipelines

Lessons

#	Lesson	Core Problem Solved
01	Docker for ML	"Works on my machine" ML debugging
02	Optimizing ML Docker Images	8GB image causing 12-minute cold starts
03	GPU Containers	GPU container ignores GPU in production
04	Container Registry and CI	Security incident from unscanned image
05	Service Mesh for ML	5 models × 3 versions routing chaos
06	Docker Compose for ML Dev	3-day onboarding reduced to 10 minutes

Estimated Time

6–8 hours total.

Why Containers Changed ML Engineering​

Module Map​

Learning Objectives​

Prerequisites​

Lessons​

Estimated Time​