What is gpu docker containers?

Build and run GPU-enabled containers for ML - covering NVIDIA Container Toolkit, CUDA compatibility, Kubernetes GPU scheduling, and debugging GPU access.

How does nvidia container toolkit work in practice?

GPU Containers covers gpu docker containers, nvidia container toolkit, cuda docker ml from first principles with code examples. Free lesson at https://engineersofai.com/docs/mlops/containerization/multi-stage-builds

What is the difference between gpu docker containers and cuda docker ml?

See the full breakdown at https://engineersofai.com/docs/mlops/containerization/multi-stage-builds

GPU Containers

The GPU That Disappeared in Kubernetes

Rodrigo spent a full day debugging a problem that should not exist: his PyTorch training container worked perfectly on his workstation with GPU, but when the same container ran in the company's Kubernetes cluster, it used only CPU. No error. No warning. Just slow training that should have been fast.

The first sign something was wrong: a training job that took 8 minutes on his workstation was projected to take 6 hours in Kubernetes. He added a check to the training script:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")

Output from the Kubernetes pod: CUDA available: False, GPU count: 0. The GPU nodes were there - kubectl describe node showed nvidia.com/gpu: 4 in the allocatable resources. But the container was not seeing them.

The root cause was a missing resource request in the Kubernetes pod spec. Without resources.limits: nvidia.com/gpu: 1, the NVIDIA device plugin does not mount the GPU devices into the container. PyTorch silently fell back to CPU. The fix was one line in the pod spec:

resources:
  limits:
    nvidia.com/gpu: 1

But understanding why that line is necessary requires understanding the full GPU container stack - which this lesson covers.

:::tip 🎮 Interactive Playground Visualize this concept: Try the Docker for ML demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

Running GPU workloads in containers requires coordination between several layers of the software stack that normally do not interact: the GPU hardware, the NVIDIA kernel driver (on the host), CUDA runtime libraries (potentially in the container), and the containerization layer (Docker or containerd with NVIDIA hooks).

Before the NVIDIA Container Toolkit (released 2017, originally called nvidia-docker), GPU access in containers required running privileged containers - which bypassed all container security boundaries. The toolkit solved this with a cleaner approach: a custom container runtime hook that injects GPU device files and driver libraries into containers at creation time, without requiring root or privileged mode.

The GPU Container Stack

The key insight: the NVIDIA kernel driver is always on the host, never in the container. The CUDA runtime libraries (libcuda.so, libcudnn.so) can be either in the container (baked into the base image) or provided by the host via the container toolkit's injection mechanism. The container toolkit maps the right GPU device files (/dev/nvidia0, /dev/nvidiactl) into the container's namespace.

CUDA Version Compatibility Matrix

The most common source of GPU container failures is CUDA version mismatch. The rules:

Host NVIDIA Driver → supports CUDA Runtime versions ≤ its max supported version
Container CUDA Runtime → must be ≤ host driver's max supported CUDA version
PyTorch CUDA build → must match container CUDA runtime

Critical rule: CUDA runtime in the container must be compatible with the driver on the host. Newer drivers are backward compatible - a host with driver 535.x can run containers with CUDA 11.8 and CUDA 12.2. But a host with driver 525.x cannot run a container with CUDA 12.2.

# Check host driver version and max supported CUDA
nvidia-smi

# Output includes:
# Driver Version: 535.161.07
# CUDA Version: 12.2         ← This is the MAX CUDA version this driver supports

# Check CUDA version inside container
docker run --gpus all nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 nvcc --version

Installing the NVIDIA Container Toolkit

# Ubuntu 22.04 setup - run on every GPU host machine

# 1. Add NVIDIA container toolkit repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# 2. Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# 3. Configure Docker to use the NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# 4. Verify installation
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

# Expected output: nvidia-smi showing your GPU(s)
# If this fails, the toolkit is not installed correctly

Running GPU Containers with Docker

# Grant access to all GPUs
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

# Grant access to specific GPU (by index)
docker run --rm --gpus '"device=0"' my-ml-image:latest python train.py

# Grant access to multiple specific GPUs
docker run --rm --gpus '"device=0,1"' my-ml-image:latest python train.py

# Grant access to 2 GPUs (let Docker choose which ones)
docker run --rm --gpus 2 my-ml-image:latest python train.py

# Old nvidia-docker2 style (still works but deprecated)
docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all my-ml-image:latest

CUDA_VISIBLE_DEVICES: Controlling GPU Allocation in Code

# src/training/gpu_setup.py
"""
GPU device management for ML training.
CUDA_VISIBLE_DEVICES is the primary way to control GPU allocation.
"""

import os
import torch
import logging

logger = logging.getLogger(__name__)


def setup_device(
    requested_gpus: int = 1,
    prefer_gpu: bool = True,
) -> torch.device:
    """
    Set up training device with proper CUDA visibility controls.

    Args:
        requested_gpus: Number of GPUs to use (0 for CPU-only)
        prefer_gpu: Fall back to CPU if GPU not available

    Returns:
        torch.device configured for training
    """
    if requested_gpus == 0 or not prefer_gpu:
        return torch.device("cpu")

    if not torch.cuda.is_available():
        if prefer_gpu:
            logger.warning("GPU requested but not available - falling back to CPU")
        return torch.device("cpu")

    available_gpus = torch.cuda.device_count()
    logger.info(f"Available GPUs: {available_gpus}")

    if available_gpus == 0:
        logger.warning("No CUDA devices visible to PyTorch - check CUDA_VISIBLE_DEVICES")
        return torch.device("cpu")

    # Log all available GPUs
    for i in range(available_gpus):
        props = torch.cuda.get_device_properties(i)
        logger.info(
            f"GPU {i}: {props.name}, "
            f"{props.total_memory / 1024**3:.1f}GB VRAM, "
            f"CUDA {props.major}.{props.minor}"
        )

    device = torch.device("cuda:0")
    torch.cuda.set_device(0)
    logger.info(f"Using device: {torch.cuda.get_device_name(0)}")

    return device


def setup_multi_gpu(requested_gpus: int) -> list[int]:
    """
    Set up multi-GPU training. Returns list of GPU indices to use.
    Sets CUDA_VISIBLE_DEVICES to restrict visibility to requested GPUs.
    """
    available = torch.cuda.device_count()

    if requested_gpus > available:
        logger.warning(
            f"Requested {requested_gpus} GPUs but only {available} available. "
            f"Using {available}."
        )
        gpu_ids = list(range(available))
    else:
        # Use first N GPUs by default
        # In practice, use a GPU scheduler or CUDA_VISIBLE_DEVICES env var
        # to select non-contiguous GPUs
        gpu_ids = list(range(requested_gpus))

    # Restrict PyTorch to only see the selected GPUs
    os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(str(i) for i in gpu_ids)
    logger.info(f"CUDA_VISIBLE_DEVICES set to: {os.environ['CUDA_VISIBLE_DEVICES']}")

    return gpu_ids


def assert_gpu_available(min_vram_gb: float = 8.0) -> None:
    """
    Assert GPU is available and meets memory requirements.
    Call at the start of training to fail fast rather than train on CPU
    when GPU was expected.
    """
    if not torch.cuda.is_available():
        raise RuntimeError(
            "GPU not available. Set CUDA_VISIBLE_DEVICES to debug. "
            "Check: (1) NVIDIA driver installed, (2) Container has --gpus flag, "
            "(3) K8s pod spec has nvidia.com/gpu resource request"
        )

    for i in range(torch.cuda.device_count()):
        props = torch.cuda.get_device_properties(i)
        vram_gb = props.total_memory / 1024**3
        if vram_gb < min_vram_gb:
            raise RuntimeError(
                f"GPU {i} ({props.name}) has {vram_gb:.1f}GB VRAM, "
                f"minimum required: {min_vram_gb}GB"
            )

    logger.info(f"GPU assertion passed: {torch.cuda.device_count()} GPU(s) available")

GPU Containers in Kubernetes

The full Kubernetes GPU setup requires the NVIDIA Device Plugin:

# Install NVIDIA Device Plugin in Kubernetes cluster
# (runs as a DaemonSet on GPU nodes)
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

helm upgrade --install nvdp nvdp/nvidia-device-plugin \
  --version 0.15.0 \
  --namespace kube-system \
  --set failOnInitError=false

# Verify device plugin is running
kubectl get pods -n kube-system -l app=nvidia-device-plugin

# Verify GPU resources are visible on nodes
kubectl describe node <gpu-node-name> | grep nvidia.com/gpu
# Should show: nvidia.com/gpu: 4   (or however many GPUs the node has)

# kubernetes/training-job.yaml - ML training job with GPU resource request
apiVersion: batch/v1
kind: Job
metadata:
  name: fraud-model-training
  namespace: ml-platform
spec:
  backoffLimit: 0
  activeDeadlineSeconds: 14400  # 4 hour timeout
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: training
          image: gcr.io/myproject/ml-training:v1.2.3
          command: ["python", "-m", "src.training.train"]
          args:
            - "--config=config/training.yaml"
            - "--output-dir=/models/output"
          env:
            - name: MLFLOW_TRACKING_URI
              valueFrom:
                secretKeyRef:
                  name: mlflow-credentials
                  key: tracking-uri
          resources:
            requests:
              cpu: "4"
              memory: "16Gi"
              # This is the critical line - without it, no GPU is allocated
              nvidia.com/gpu: "1"
            limits:
              cpu: "8"
              memory: "32Gi"
              nvidia.com/gpu: "1"   # requests must equal limits for GPU
          volumeMounts:
            - name: model-storage
              mountPath: /models
            - name: training-data
              mountPath: /data
              readOnly: true
      volumes:
        - name: model-storage
          persistentVolumeClaim:
            claimName: ml-model-store
        - name: training-data
          persistentVolumeClaim:
            claimName: training-data-pvc
      nodeSelector:
        # Schedule only on GPU nodes (nodes with this label)
        cloud.google.com/gke-accelerator: nvidia-tesla-a100
      tolerations:
        # GPU nodes typically have a taint to prevent non-GPU pods from scheduling on them
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule

Testing GPU Availability Inside Containers

# scripts/test_gpu_in_container.py
"""
Diagnostic script - run inside container to verify GPU access.
Usage: python scripts/test_gpu_in_container.py
"""

import sys
import torch
import subprocess


def check_nvidia_smi():
    """Test that nvidia-smi is accessible from inside the container."""
    try:
        result = subprocess.run(
            ["nvidia-smi", "--query-gpu=name,memory.total,driver_version",
             "--format=csv,noheader"],
            capture_output=True, text=True, timeout=10
        )
        if result.returncode == 0:
            print("nvidia-smi output:")
            print(result.stdout)
            return True
        else:
            print(f"nvidia-smi failed: {result.stderr}")
            return False
    except FileNotFoundError:
        print("nvidia-smi not found in container - NVIDIA toolkit not injecting binaries")
        return False


def check_pytorch_cuda():
    """Test that PyTorch can see and use GPUs."""
    print(f"PyTorch version: {torch.__version__}")
    print(f"CUDA available: {torch.cuda.is_available()}")

    if not torch.cuda.is_available():
        print("\nCUDA NOT AVAILABLE. Debug checklist:")
        print("  1. Is the container running with --gpus flag (Docker)?")
        print("  2. Does the K8s pod spec have nvidia.com/gpu resource request?")
        print("  3. Is NVIDIA Container Toolkit installed on the host?")
        print("  4. Does the container's CUDA version match the host driver?")
        print(f"     Run: nvidia-smi on host to check driver's max CUDA version")
        return False

    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")

    for i in range(torch.cuda.device_count()):
        props = torch.cuda.get_device_properties(i)
        print(f"  GPU {i}: {props.name}, {props.total_memory / 1024**3:.1f}GB")

    # Perform actual tensor operation on GPU to verify execution
    print("\nRunning GPU computation test...")
    device = torch.device("cuda:0")
    x = torch.randn(1000, 1000, device=device)
    y = torch.mm(x, x.T)
    torch.cuda.synchronize()
    print(f"Matrix multiply on GPU succeeded. Result shape: {y.shape}")
    return True


if __name__ == "__main__":
    print("=" * 50)
    print("GPU Container Diagnostic")
    print("=" * 50)

    smi_ok = check_nvidia_smi()
    pytorch_ok = check_pytorch_cuda()

    if smi_ok and pytorch_ok:
        print("\nAll GPU checks PASSED")
        sys.exit(0)
    else:
        print("\nSome GPU checks FAILED - see above for details")
        sys.exit(1)

Production Notes

GPU resource limits: In Kubernetes, GPU resource requests must equal limits (unlike CPU/memory which can be different). This is a Kubernetes + NVIDIA device plugin constraint. Always set both requests and limits for nvidia.com/gpu to the same value.

GPU sharing: By default, Kubernetes GPU resources are not divisible - a pod either gets a whole GPU or no GPU. For inference workloads that do not use the full GPU, consider NVIDIA MIG (Multi-Instance GPU) for A100/H100 GPUs, or use a GPU sharing solution like NVIDIA Time Slicing for older GPUs.

CUDA version pinning: Pin CUDA versions explicitly in your Dockerfile base image tag. Never use nvidia/cuda:latest - CUDA major version upgrades (e.g., 11.x → 12.x) can break PyTorch compatibility in ways that are hard to debug. Pin to the exact tag: nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04.

:::tip Always Include a GPU Availability Check at Startup Add assert_gpu_available() or equivalent at the start of every training script. When a training job runs on CPU instead of GPU due to a misconfiguration, you want it to fail immediately with a clear error, not train for hours before someone notices the wall-clock time is wrong. Fail fast, with a diagnostic message that tells the operator exactly what to check. :::

:::warning CUDA Version Mismatch is Silent A CUDA version mismatch between container and host does not always produce an error. Sometimes PyTorch falls back to CPU silently (if torch.cuda.is_available() returns False), and sometimes it crashes with an opaque error like CUDA error: unknown error. Always verify GPU access explicitly with a test script before deploying a new container version to production GPU workloads. :::

:::danger Privileged Mode for GPU Access Some older documentation suggests running GPU containers with --privileged. Never do this in production. Privileged mode gives the container root-level access to the host system - effectively breaking container isolation. The NVIDIA Container Toolkit provides GPU access without privileged mode. If you find a guide suggesting privileged mode for GPU access, it is outdated. :::

Interview Q&A

Q: What is the NVIDIA Container Toolkit and what problem does it solve?

The NVIDIA Container Toolkit (previously called nvidia-docker or nvidia-docker2) enables GPU access in containers without requiring privileged mode. It installs a custom OCI runtime hook that intercepts container creation and injects GPU device files (/dev/nvidia0, /dev/nvidiactl) and driver user-space libraries (libcuda.so, libnvidia-ml.so) into the container's namespace. The NVIDIA kernel driver remains on the host - the container sees the GPU through these injected files and libraries.

Q: How do CUDA version compatibility requirements work in GPU containers?

The relationship is: Host NVIDIA Driver → supports maximum CUDA Runtime version. The container's CUDA runtime version must be less than or equal to the host driver's maximum supported CUDA version. NVIDIA drivers are backward compatible - a newer driver can run older CUDA runtimes. For example, driver 535.x supports up to CUDA 12.2, so it can run containers with CUDA 11.8, 12.0, or 12.2. But a host with driver 525.x (max CUDA 12.0) cannot run a container with CUDA 12.2. Always check nvidia-smi on the host to see the maximum supported CUDA version.

Q: Why does a GPU container work in Docker but fail to use GPU in Kubernetes?

The most common reason: the Kubernetes pod spec is missing the nvidia.com/gpu resource request and limit. Without this, the NVIDIA device plugin does not allocate GPU devices to the pod, and /dev/nvidia* files are not mounted in the container. The container then has no GPU access and PyTorch silently falls back to CPU. Fix: add nvidia.com/gpu: "1" to both resources.requests and resources.limits in the pod spec. Also check that the NVIDIA device plugin DaemonSet is running and that the node has the correct tolerations.

Q: What is CUDA_VISIBLE_DEVICES and how do you use it in ML containers?

CUDA_VISIBLE_DEVICES is an environment variable that controls which GPUs are visible to CUDA applications running in the process. CUDA_VISIBLE_DEVICES=0 shows only GPU 0. CUDA_VISIBLE_DEVICES=0,2 shows GPUs 0 and 2. CUDA_VISIBLE_DEVICES="" shows no GPUs (forces CPU mode). In containers, the Docker/Kubernetes runtime sets this based on which GPUs are allocated. In multi-GPU training scripts, you can set it programmatically to restrict each process to its assigned GPU for data-parallel training.

Q: How do you test that a GPU container is correctly using the GPU before deploying to production?

Include a diagnostic script in your container that: (1) runs nvidia-smi and checks it returns successfully, (2) checks torch.cuda.is_available() returns True, (3) performs an actual tensor operation on GPU and verifies it completes without error. Run this script in CI after building the container and again on the first pod startup in production. Add assertions at the start of training scripts that fail immediately if GPU is not available when expected, rather than silently running on CPU.

The GPU That Disappeared in Kubernetes​

Why This Exists​

The GPU Container Stack​

CUDA Version Compatibility Matrix​

Installing the NVIDIA Container Toolkit​

Running GPU Containers with Docker​

CUDA_VISIBLE_DEVICES: Controlling GPU Allocation in Code​

GPU Containers in Kubernetes​

Testing GPU Availability Inside Containers​

Production Notes​

Interview Q&A​