Docker for ML
Three Environments, Three Failures
Yuki spent two weeks debugging a model that worked perfectly on her MacBook Pro but failed silently in production. The model was a gradient boosted tree trained on tabular insurance claim data. On her machine: Python 3.11, scikit-learn 1.4.2, pandas 2.0.3. The training script ran flawlessly. Predictions matched expectations.
The first deployment failure was on the data preprocessing step: pandas 1.5.3 was installed in
production (the ops team had not updated it), and a DataFrame.pivot_table call behaved
differently in 1.5 vs 2.0. The result: silently wrong features. Not an error - a different answer.
After that was fixed by pinning the pandas version, a second issue surfaced: scikit-learn's
HistGradientBoostingClassifier had a serialization format change between 1.3 and 1.4. The
model file trained on Yuki's machine with 1.4 could not be loaded in the production environment
running 1.3. Exception: ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long'.
The fixes were applied one at a time over three days. A Dockerfile would have caught all of them in 10 minutes on Yuki's first commit: same Python version, same package versions, same system libraries, everywhere.
This lesson covers Docker fundamentals from an ML perspective - not Docker as a DevOps topic, but Docker as the solution to the most common class of production ML bugs.
:::tip 🎮 Interactive Playground Visualize this concept: Try the Docker for ML demo on the EngineersOfAI Playground - no code required. :::
Why This Exists
Docker was created at dotCloud (later renamed Docker Inc.) in 2013 and open-sourced the same year. Solomon Hykes demonstrated it at PyCon 2013. The core insight was that Linux namespaces and cgroups - existing kernel features for process isolation - could be packaged into a usable, developer-friendly tool.
For software engineers, containers solved the deployment environment problem. For ML engineers, they solve the experiment-to-production environment problem, which is more complex: not just code but trained model artifacts, preprocessing pipelines, and framework-specific binary formats all need to be environment-consistent from development to deployment.
The ML adoption of containers accelerated significantly after 2017 when NVIDIA released the NVIDIA Container Toolkit, making GPU resources accessible to containerized workloads. Before that, GPU training in containers required significant manual configuration.
How Docker Works: The Mental Model
A Docker container is a process that thinks it is running on its own Linux machine. It has its own filesystem (the image layers), its own network interfaces, and its own process namespace. From inside the container, it cannot see the host filesystem or host processes.
The key point: the container runs your Python 3.11 and scikit-learn 1.4 regardless of what is installed on the host machine. The host only needs to run Docker. This is why containers solve the environment problem.
Docker Images and Layers
A Docker image is a stack of read-only layers. Each instruction in a Dockerfile creates one layer. Layers are cached: if an instruction and all preceding instructions have not changed, Docker reuses the cached layer without re-executing it.
This caching is the most important performance concept in Dockerfile writing for ML:
Layer 1: FROM python:3.11-slim ← changes rarely → cache hit almost always
Layer 2: RUN pip install torch ← changes rarely → cache hit most of the time
Layer 3: RUN pip install scikit-learn ← changes occasionally → cache hit often
Layer 4: COPY src/ /app/src/ ← changes on every code change → cache miss often
Layer 5: COPY models/ /app/models/ ← changes when model updates → cache miss sometimes
The golden rule: Put instructions that change rarely near the top, instructions that change often near the bottom. This maximizes cache hit rate and minimizes rebuild time.
Your First ML Dockerfile
# Dockerfile - basic ML inference container
# Base image: Python 3.11 slim (Debian-based, small)
FROM python:3.11-slim
# Metadata
LABEL maintainer="[email protected]"
LABEL description="Fraud detection model inference service"
# Set working directory
WORKDIR /app
# Install system dependencies first (changes rarely)
# These are often needed by Python packages (libgomp for LightGBM, etc.)
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 \
libglib2.0-0 \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy dependency files first (before code - cache optimization)
COPY requirements.txt .
# Install Python dependencies
# --no-cache-dir: don't store pip cache in image (saves space)
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code (changes frequently - near the bottom)
COPY src/ ./src/
# Create directory for model files
RUN mkdir -p /app/models
# Copy model (may change independently of code)
COPY models/fraud_detector_v3.joblib ./models/
# Non-root user for security
RUN useradd -m -r mluser && chown -R mluser:mluser /app
USER mluser
# Expose inference API port
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Run the inference server
CMD ["python", "-m", "src.serve.server", "--port", "8080"]
The .dockerignore File
Without .dockerignore, Docker sends your entire project directory to the Docker daemon on
every build. For ML projects, this includes: notebooks (potentially gigabytes), raw training data,
virtual environments (venv/), git history (.git/), and experiment logs. This makes builds
slow and can accidentally include secrets.
# .dockerignore - what NOT to include in Docker builds
# Development artifacts
.git/
.gitignore
*.md
notebooks/
docs/
# Virtual environments (large, wrong platform)
venv/
.venv/
env/
__pycache__/
*.pyc
*.pyo
# ML development artifacts
data/raw/
data/processed/
experiments/
mlruns/
wandb/
*.log
# Large model files NOT needed in this image
# (add specific model files with COPY, not the whole directory)
models/training_checkpoints/
models/experimental/
# Secrets and local config
.env
.env.local
config/secrets.yaml
*.pem
*.key
# Test artifacts
tests/
pytest_cache/
.coverage
htmlcov/
# CI/CD
.github/
.gitlab-ci.yml
Dockerfile*
docker-compose*.yml
Base Image Selection for ML
The base image choice has major implications for image size, security, and GPU support:
Avoid Alpine for ML: Alpine uses musl libc instead of glibc. Many ML packages (numpy, scipy, PyTorch wheels) are compiled against glibc and will either fail to install or require compilation from source on Alpine. The image size savings are rarely worth the compatibility headaches.
-slim vs full: python:3.11-slim omits development tools (gcc, make) and documentation.
This is correct for production inference images - you only need the runtime, not the build tools.
If your dependencies require compilation from source (unusual with binary wheels), use the full
image for building and slim for the final stage (multi-stage build - see Lesson 02).
Managing Large Model Files
Model files are a pain point unique to ML Docker images. A transformer model can be 1-10GB. You have three strategies:
Strategy 1: Bake the model into the image
# Simple, self-contained. Image is large but has no external dependencies.
# Good for: small models (< 100MB), air-gapped environments
COPY models/fraud_detector_v3.joblib /app/models/
Strategy 2: Download model at container startup
# Image is small. Model downloaded from S3/GCS when container starts.
# Good for: large models, frequent model updates without image rebuilds
# Add download logic to your entrypoint script
COPY scripts/download_model.sh /app/
ENTRYPOINT ["/app/download_model.sh"]
CMD ["python", "-m", "src.serve.server"]
#!/bin/bash
# scripts/download_model.sh
set -e
MODEL_PATH="/app/models/model.joblib"
if [ ! -f "$MODEL_PATH" ]; then
echo "Downloading model from s3://$MODEL_BUCKET/$MODEL_KEY..."
aws s3 cp "s3://$MODEL_BUCKET/$MODEL_KEY" "$MODEL_PATH"
fi
exec "$@" # Execute CMD arguments (the Python server)
Strategy 3: Mount model via volume
# Image contains no model. Model file mounted at runtime.
# Good for: CI/CD where model version is managed externally
# docker run -v /host/models:/app/models my-ml-image
# docker-compose.yml example
services:
fraud-model:
image: fraud-inference:latest
volumes:
- /data/models/fraud:/app/models:ro # Read-only mount
environment:
- MODEL_PATH=/app/models/fraud_detector_v3.joblib
The right strategy depends on: model size, update frequency, deployment environment, and whether you need fully self-contained images (for reproducibility audits).
Docker Compose for Local ML Development
For local development with multiple services, Docker Compose lets you define the full stack in a single YAML file:
# docker-compose.yml - local ML development stack
version: "3.9"
services:
# Feature store (local Redis-backed)
feature-store:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
# MLflow tracking server
mlflow:
image: ghcr.io/mlflow/mlflow:v2.11.0
ports:
- "5000:5000"
environment:
- MLFLOW_BACKEND_STORE_URI=sqlite:///mlflow.db
- MLFLOW_DEFAULT_ARTIFACT_ROOT=/mlartifacts
volumes:
- mlflow-data:/mlartifacts
command: mlflow server --host 0.0.0.0 --port 5000
# Training service (CPU-only for local dev)
training:
build:
context: .
dockerfile: Dockerfile.training
depends_on:
feature-store:
condition: service_healthy
mlflow:
condition: service_started
environment:
- MLFLOW_TRACKING_URI=http://mlflow:5000
- REDIS_HOST=feature-store
- REDIS_PORT=6379
volumes:
- ./src:/app/src # Live code reload
- ./data:/app/data:ro # Read-only data mount
- model-artifacts:/app/models # Shared model store
profiles:
- training # Only start with: docker compose --profile training up
# Inference server
inference:
build:
context: .
dockerfile: Dockerfile.inference
ports:
- "8080:8080"
depends_on:
feature-store:
condition: service_healthy
environment:
- REDIS_HOST=feature-store
- MODEL_PATH=/app/models/fraud_detector_v3.joblib
volumes:
- model-artifacts:/app/models:ro # Read-only model access
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
# Prometheus for metrics (optional monitoring)
prometheus:
image: prom/prometheus:v2.51.0
ports:
- "9090:9090"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
profiles:
- monitoring
volumes:
mlflow-data:
model-artifacts:
# Start just the core services (inference + feature store + mlflow)
docker compose up
# Start with training profile
docker compose --profile training up
# Start everything including monitoring
docker compose --profile training --profile monitoring up
# Run a one-off training job
docker compose run --rm training python -m src.training.train --config config/training.yaml
Production Notes
Pin base image digests: Tags like python:3.11-slim are mutable - the same tag can point
to different images on different days as security patches are applied. In production, pin to the
digest: python:3.11-slim@sha256:abc123.... Use the digest in your Dockerfile, update it
deliberately when you want to take new base image updates.
Non-root users: Always run inference containers as a non-root user. Most public base images
run as root by default. Add RUN useradd -r mluser and USER mluser to your Dockerfile. If
your application writes to disk (logs, temp files), ensure the directories are owned by mluser.
Multi-platform builds: Data scientists often develop on Apple Silicon (M1/M2/M3) while
production runs on x86-64. Build multi-platform images in CI with docker buildx build --platform linux/amd64,linux/arm64. Without this, ML images built on M1 Macs may silently
run via Rosetta in production, which is slower and not what you want.
:::tip Use docker scan Before Every Push
Docker Desktop and docker scout (previously docker scan) can identify known CVEs in your
image layers before you push. Run docker scout cves my-image:tag as part of your pre-push
workflow. Critical CVEs in base images are common - the earlier you catch them, the better.
:::
:::warning Layer Cache Invalidation Cascade
If any layer cache is invalidated (because the instruction or a preceding instruction changed),
all subsequent layers must also be rebuilt. This means putting COPY requirements.txt . before
RUN pip install is critical: it allows the dependency install layer to be cached even when
application code changes. If you copy all source code first, changing any Python file will
invalidate the pip install layer and trigger a full reinstall on every build.
:::
:::danger Including Training Data in Docker Images Never include training data in a Docker image. Even partial datasets are usually large (gigabytes), private, and subject to data governance policies that prohibit storing them in immutable image layers. Images are shared, cached, and potentially pushed to public registries. Training data belongs in a data lake (S3/GCS), accessible at training time via environment variables and IAM roles, never baked into an image. :::
Interview Q&A
Q: Why are Docker containers useful for ML systems specifically?
ML systems have a more complex environment dependency problem than standard software: not just code and OS libraries, but framework versions (PyTorch, scikit-learn), CUDA and cuDNN versions for GPU workloads, compiled binary formats for model artifacts (scikit-learn 1.3 cannot load models saved with 1.4), and data preprocessing implementations that must be identical between training and serving. Containers encapsulate all of these in a single artifact, ensuring that what runs in development is byte-for-byte identical to what runs in production.
Q: How does Docker layer caching work and how do you optimize it for ML images?
Docker caches each Dockerfile instruction as a layer. If an instruction and all preceding instructions are unchanged, Docker uses the cached layer. To maximize cache hits: (1) Install system dependencies first (rarely change), (2) Copy requirements files and install dependencies before copying application code (dependency changes are less frequent than code changes), (3) Copy model files after application code if they change on a different cadence. The pattern: FROM → system deps → requirements copy → pip install → code copy → model copy.
Q: What is the difference between python:3.11-slim and python:3.11-alpine?
python:3.11-slim uses Debian minimal (glibc). python:3.11-alpine uses Alpine Linux (musl libc,
smaller). Alpine sounds appealing for size, but most ML Python packages (numpy, scipy, PyTorch)
distribute precompiled binary wheels built against glibc. These wheels will not run on Alpine's
musl, requiring compilation from source, which takes much longer and often fails. For ML workloads,
python:3.11-slim is almost always the right choice for CPU inference; NVIDIA CUDA images are
required for GPU workloads.
Q: How do you handle large model files in Docker images?
Three strategies: (1) Bake the model into the image (COPY command) - simple, self-contained, but image is large and updates require rebuilding the image; appropriate for small models under 100MB. (2) Download at startup - the image is slim, the model is downloaded from S3/GCS when the container starts; appropriate for large models, updated independently of code. (3) Volume mount - the model is mounted from the host or a shared volume at runtime; appropriate for CI/CD environments where model version is managed externally. Most production systems use strategy 2 for large models.
Q: What security practices should you follow when building ML Docker images?
Key practices: (1) Run as a non-root user - create a dedicated user and switch to it at the
end of the Dockerfile. (2) Use .dockerignore to prevent accidentally including secrets,
credentials, or raw data. (3) Pin dependency versions in requirements.txt (not >= but ==).
(4) Scan images for CVEs with Trivy or Docker Scout before pushing to registry. (5) Never
include credentials in the Dockerfile - use environment variables or secrets management
(AWS Secrets Manager, Kubernetes secrets) at runtime.
