Skip to main content

Dependency Management and Packaging

The Environment That Could Not Be Reproduced

The model had been training in production for three months. The team was happy with its performance. Then the data center migrated to new GPU nodes with CUDA 12.2 while the training pipeline expected CUDA 11.8. The environment setup was pip install -r requirements.txt. No pinned CUDA-compatible PyTorch build. No lock file. No tested Docker image. Just a requirements file with torch>=1.12 and a hope that things would work out.

They did not work out. The new nodes had a different CUDA runtime. PyTorch installed the latest version (which defaulted to CUDA 12.x). The custom Cython extensions the team had written were compiled against the old CUDA toolkit headers. The build succeeded. Training started. Thirty minutes in, a segmentation fault. Not a Python exception - a segfault in native code, deep inside a CUDA kernel, with a stack trace pointing nowhere useful.

It took two days to reproduce the environment on a development machine. It took another day to understand the CUDA/PyTorch/custom-extension compatibility matrix. It took a fourth day to fix the build pipeline to pin CUDA versions properly and test the environment in Docker before deploying it to production.

Four days of senior engineer time because nobody thought carefully about dependency management. Because "pip install -r requirements.txt" felt like enough.

This lesson is about the infrastructure that prevents this story. Python packaging has a long and complicated history, and the current state of the art - pyproject.toml, uv, lock files, and GPU-aware dependency specifications - is the result of hard lessons learned across the ecosystem. Understanding it correctly means your environments are reproducible, your builds are fast, and your CUDA compatibility matrix is explicit, not accidental.


Why This Exists - The Reproducibility Problem

A Python program's behavior depends not just on the code you write, but on every version of every library it imports, and on every version of every library those libraries import, and on the C extensions those libraries compiled against, and on the system libraries those C extensions link to.

When you share code with "just pip install the requirements," you are hoping that every dependency resolves to the same version on every machine. Without lock files, that hope is almost always disappointed. Package X releases a bugfix version that introduces a subtle API change. Package Y drops support for Python 3.9 with no warning. A transitive dependency updates and breaks a direct dependency that had not pinned its own subdependencies.

The goal of modern Python packaging is reproducibility: given the same source code and the same dependency specification, every engineer on the team, every CI run, and every production deployment installs exactly the same set of package versions. Lock files, pinned hashes, and containerized environments are the mechanisms that achieve this.


Historical Context - Python Packaging's Messy Past

Python's packaging history is a long series of problems followed by solutions that created new problems.

distutils (1998, stdlib): The original build system. Allowed packages to define setup.py with compilation instructions. Minimal, inflexible, but it started the ecosystem.

setuptools (2004): Replaced distutils with more features. Introduced easy_install and eggs (.egg files). Eggs were the precursor to wheels. setuptools added setup.py develop for editable installs.

pip (2008): Ian Bicking wrote pip as a replacement for easy_install. It introduced requirements.txt, unpacking archives, and dependency resolution. pip became the standard tool but for years had no real dependency resolver.

virtualenv (2007): Carl Meyer wrote virtualenv to create isolated Python environments. Before virtualenv, all packages were installed globally, causing conflicts when different projects needed different versions.

wheel (2012): Mark Abramowitz defined the wheel format (.whl) as a replacement for eggs. Wheels are zip files with a defined directory structure. Binary wheels contain pre-compiled C extensions. manylinux wheels (2016) extended this to Linux compatibility.

PEP 517/518 (2015/2016): The critical standardization moment. PEP 518 introduced pyproject.toml as the project metadata file. PEP 517 defined a standard build backend interface, decoupling build frontends (pip) from build backends (setuptools, flit, poetry-core, hatchling). This enabled the modern packaging ecosystem.

uv (2024): Astral rewrote the Python package installer in Rust. It is 10-100x faster than pip for resolving and installing dependencies, implements the same interface, and is rapidly becoming the standard for new projects.


pyproject.toml - The Modern Packaging Standard

pyproject.toml consolidates what previously required setup.py, setup.cfg, and MANIFEST.in into a single TOML file.

PEP 517/518 Structure

# pyproject.toml - complete example for an ML library with C extensions

# PEP 518: specifies build-time dependencies
[build-system]
requires = [
"setuptools>=68", # build backend
"wheel>=0.42", # wheel packaging
"Cython>=3.0", # Cython for C extensions
"numpy>=1.24", # needed for numpy C headers during build
]
build-backend = "setuptools.backends.legacy:build" # PEP 517: which tool to build with

# PEP 621: project metadata
[project]
name = "my-ml-library"
version = "0.3.1"
description = "High-performance ML operations with Cython accelerations"
readme = "README.md"
license = { file = "LICENSE" }
requires-python = ">=3.10"
authors = [
{ name = "Your Team", email = "engineering@yourorg.com" }
]
keywords = ["machine-learning", "numpy", "cython", "embeddings"]
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"License :: OSI Approved :: MIT License",
"Intended Audience :: Science/Research",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]

# Runtime dependencies
dependencies = [
"numpy>=1.24,<3.0",
"scipy>=1.10",
"pydantic>=2.0",
]

[project.optional-dependencies]
# pip install my-ml-library[gpu]
gpu = [
"torch>=2.0",
"torchvision>=0.15",
]
# pip install my-ml-library[dev]
dev = [
"pytest>=7.0",
"pytest-cov",
"mypy>=1.8",
"ruff>=0.3",
"line-profiler",
"memory-profiler",
]
# pip install my-ml-library[docs]
docs = [
"sphinx>=7.0",
"sphinx-rtd-theme",
]
# pip install my-ml-library[all]
all = [
"my-ml-library[gpu]",
"my-ml-library[dev]",
"my-ml-library[docs]",
]

[project.urls]
Homepage = "https://yourorg.com"
Repository = "https://github.com/yourorg/my-ml-library"
Documentation = "https://docs.yourorg.com"
"Issue Tracker" = "https://github.com/yourorg/my-ml-library/issues"

[project.scripts]
# Creates an executable command: my-ml-train
my-ml-train = "my_ml_library.cli:main"

# Tool configurations (ruff, mypy, pytest, etc.)
[tool.ruff]
line-length = 100
target-version = "py310"

[tool.ruff.lint]
select = ["E", "F", "I", "N", "UP", "B"]

[tool.mypy]
python_version = "3.11"
strict = true

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --cov=my_ml_library --cov-report=term-missing"

Virtual Environments - Isolation Fundamentals

A virtual environment is an isolated Python installation with its own site-packages. Changes to one environment do not affect others.

# Create a virtual environment
python -m venv .venv

# Activate (Linux/macOS)
source .venv/bin/activate

# Activate (Windows)
.venv\Scripts\activate

# Install packages into the activated environment
pip install numpy torch

# Deactivate
deactivate

# Check what is installed
pip list
pip show numpy # details on a specific package

What virtualenv Adds

virtualenv (third-party, predates the stdlib venv) adds features like:

  • Creating environments from specific Python versions on the system
  • Faster creation
  • Seeding with pip/setuptools automatically
  • Cross-platform Windows support improvements
pip install virtualenv

# Create with a specific Python version
virtualenv --python=python3.11 .venv

# Useful for teams where different engineers have different Python versions

uv - The Modern Package Manager

uv is a Rust-based drop-in replacement for pip, pip-tools, virtualenv, and pyenv. Created by Astral (the same team behind ruff). It is the fastest Python package manager available.

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or: pip install uv

# Create a virtual environment
uv venv

# Install packages (order of magnitude faster than pip)
uv pip install numpy torch

# Install from pyproject.toml
uv pip install -e ".[dev]"

# Sync from lock file (exactly reproducible)
uv pip sync requirements.lock

# Generate a lock file from requirements.txt
uv pip compile requirements.txt -o requirements.lock

# Compile with extras
uv pip compile pyproject.toml --extra dev --extra gpu -o requirements-dev.lock

uv Speed Comparison

pip install torch numpy scipy (cold cache): ~120 seconds
uv pip install torch numpy scipy (cold cache): ~8 seconds
uv pip install torch numpy scipy (warm cache): ~1.5 seconds

uv parallelizes downloads, uses a shared cache across projects, and implements faster dependency resolution.

uv Projects and Workspaces

# Initialize a new project with uv
uv init my-ml-project
cd my-ml-project

# Add a dependency
uv add numpy

# Add a dev dependency
uv add --dev pytest mypy ruff

# Add GPU dependencies as an extra
uv add --optional gpu torch torchvision

# Lock the environment (creates uv.lock)
uv lock

# Sync the environment from lock file
uv sync

# Run a command in the managed environment
uv run python train.py
uv run pytest tests/

The uv.lock file records exact versions and hashes for every package, making environments perfectly reproducible.


Poetry vs PDM vs Hatch

Several tools have emerged as opinionated wrappers around pyproject.toml with integrated dependency management:

Poetry: Most mature, excellent lock file, PubGrub dependency resolver, integrated virtual env management. The default choice for many projects. Slower than uv.

PDM: Modern Python Dependency Manager, faster than Poetry, uses PEP 582 for no-venv installs (experimental). Good for teams that want a Poetry-like workflow with better performance.

Hatch: Excellent for managing multiple testing environments (test against Python 3.10, 3.11, 3.12 simultaneously), good for plugins and complex build workflows.

uv: Fastest, most compatible with pip workflows, the best choice for new projects as of 2024-2025.

Dependency Resolution: PubGrub

Poetry and several other tools use the PubGrub algorithm for dependency resolution. PubGrub (by Natalie Weizenbaum, 2018) solves the SAT problem at the heart of version resolution:

Given: a set of packages, each with version requirements on other packages, find a set of versions satisfying all requirements simultaneously, or report why no such set exists.

The key advantage of PubGrub over older resolution approaches is that it produces helpful error messages when resolution fails. Instead of "could not install because of version conflicts," it tells you exactly which pair of packages have incompatible requirements and why.


Lock Files and Reproducibility

A requirements file (requirements.txt) specifies constraints. A lock file specifies exact versions and cryptographic hashes. Lock files make environments perfectly reproducible.

The pip-tools Workflow

pip install pip-tools

# requirements.in: your high-level constraints
# numpy>=1.24
# torch>=2.0
# pydantic>=2.0

# Compile to a lock file
pip-compile requirements.in --output-file requirements.lock

# requirements.lock contains:
# numpy==1.26.4 \
# --hash=sha256:2a02aba9ed12e4ac4eb3ea9421c420301a0c6460d9830d74a9df87efa4912010 \
# --hash=sha256:0711488723b5a5bbcb6e6e8d3e97c...

# Install exactly what the lock file specifies
pip install --require-hashes -r requirements.lock

The --hash entries ensure that even if a package is republished (rare but it happens), you get exactly the bytes you tested with.

uv Lock File

# uv lock generates uv.lock automatically
uv lock

# The lock file records exact versions and hashes:
# [[package]]
# name = "numpy"
# version = "1.26.4"
# source = { registry = "https://pypi.org/simple" }
# sdist = { url = "...", hash = "sha256:...", size = ... }
# wheels = [
# { url = "...", hash = "sha256:...", size = ... },
# ]

Building Distributions - sdist and wheel

A Python package is distributed as either a source distribution (sdist) or a wheel:

  • sdist (.tar.gz): source code plus build instructions. Requires the user to have a compiler. Runs setup.py or the equivalent on the user's machine.
  • wheel (.whl): pre-built, platform-specific binary. No compilation required. Much faster to install.
# Build both sdist and wheel
python -m build

# Builds:
# dist/my_ml_library-0.3.1.tar.gz (sdist)
# dist/my_ml_library-0.3.1-cp311-cp311-linux_x86_64.whl (binary wheel)

# For pure Python packages, a universal wheel works on all platforms
# dist/my_ml_library-0.3.1-py3-none-any.whl

manylinux Wheels for C Extensions

Binary wheels are platform-specific. A wheel built on Ubuntu 22.04 will not run on CentOS 7. The manylinux standard solves this:

# Build manylinux2014-compatible wheels in Docker
docker run --rm -v $(pwd):/io quay.io/pypa/manylinux2014_x86_64 bash -c "
cd /io
for PYBIN in /opt/python/cp310-*/bin /opt/python/cp311-*/bin /opt/python/cp312-*/bin; do
\${PYBIN}/pip install cython numpy
\${PYBIN}/pip wheel . --no-deps -w /io/wheelhouse/
done
for whl in /io/wheelhouse/my_ml_library*.whl; do
auditwheel repair \"\$whl\" --plat manylinux2014_x86_64 -w /io/dist/
done
"

Modern projects automate this with cibuildwheel:

# .github/workflows/build-wheels.yml
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-13, windows-latest]

steps:
- uses: actions/checkout@v4
- uses: pypa/cibuildwheel@v2.16.0
env:
CIBW_BUILD: "cp310-* cp311-* cp312-*"
CIBW_ARCHS_LINUX: "x86_64 aarch64"
CIBW_BEFORE_BUILD: "pip install cython numpy"
CIBW_TEST_REQUIRES: "pytest"
CIBW_TEST_COMMAND: "pytest {project}/tests"

PyPI Publishing

# Install build and twine
pip install build twine

# Build the distribution
python -m build

# Check the distribution
twine check dist/*

# Upload to TestPyPI first
twine upload --repository testpypi dist/*
pip install --index-url https://test.pypi.org/simple/ my-ml-library

# Upload to real PyPI
twine upload dist/*

# Or with uv (token stored in ~/.pypi-token)
uv publish --token $PYPI_TOKEN

GitHub Actions Publish Workflow

# .github/workflows/publish.yml
name: Publish to PyPI

on:
release:
types: [published]

jobs:
build-wheels:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-13, windows-latest]

steps:
- uses: actions/checkout@v4
- uses: pypa/cibuildwheel@v2.16.0
env:
CIBW_BUILD: "cp310-* cp311-* cp312-*"
- uses: actions/upload-artifact@v4
with:
name: wheels-${{ matrix.os }}
path: wheelhouse/

publish:
needs: build-wheels
runs-on: ubuntu-latest
environment: release # requires approval in GitHub settings
permissions:
id-token: write # for trusted publishing (no API key needed)

steps:
- uses: actions/download-artifact@v4
with:
pattern: wheels-*
merge-multiple: true
path: dist/

- name: Build sdist
run: pipx run build --sdist

- uses: pypa/gh-action-pypi-publish@release/v1
# Trusted publishing: no PYPI_TOKEN needed
# Configure at pypi.org: Settings > Publishing > Trusted Publishers

Private Package Registries

For internal libraries not published to PyPI, private registries provide the same pip install experience with access control.

AWS CodeArtifact

# Authenticate (token expires in 12 hours)
aws codeartifact login \
--tool pip \
--repository my-internal-repo \
--domain my-domain \
--domain-owner 123456789012 \
--region us-east-1

# This sets pip's index URL to your CodeArtifact repository
# Subsequent pip installs check your private registry first, then PyPI

# Install from private registry
pip install my-internal-ml-library

# Publish to private registry
twine upload \
--repository-url https://my-domain-123456789012.d.codeartifact.us-east-1.amazonaws.com/pypi/my-internal-repo/legacy/ \
--username aws \
--password $(aws codeartifact get-authorization-token --domain my-domain --query authorizationToken --output text) \
dist/*

Configuration for Teams

# pyproject.toml - point uv at private registry
[tool.uv]
index-url = "https://my-domain-account.d.codeartifact.region.amazonaws.com/pypi/my-repo/simple/"
extra-index-url = ["https://pypi.org/simple/"]

# Or per-environment using pip.conf
# [global]
# index-url = https://your-registry/simple/
# extra-index-url = https://pypi.org/simple/
# trusted-host = your-registry

Artifactory / Nexus

# pip.conf (Linux: ~/.pip/pip.conf, macOS: ~/Library/Application Support/pip/pip.conf)
# [global]
# index-url = https://your-artifactory.example.com/artifactory/api/pypi/pypi-virtual/simple
# trusted-host = your-artifactory.example.com

# .netrc for credentials (more secure than URL-embedded passwords)
# machine your-artifactory.example.com
# login your-username
# password your-api-key

Docker Layer Caching for Python Dependencies

Docker builds are slow when they reinstall all dependencies on every code change. The solution is separating dependency installation from source code copying, exploiting Docker's layer cache.

The Anti-Pattern

# WRONG - rebuilds all dependencies on every code change
FROM python:3.11-slim

WORKDIR /app
COPY . . # copies source code
RUN pip install -r requirements.txt # reinstalls everything when ANY file changes
CMD ["python", "train.py"]

The Correct Pattern

# CORRECT - dependencies cached unless requirements change
FROM python:3.11-slim

# Install system dependencies (rarely changes - cached)
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Copy ONLY the dependency files first (changes rarely)
COPY pyproject.toml requirements.lock ./

# Install dependencies (cached if requirements files unchanged)
RUN pip install --no-cache-dir -r requirements.lock

# NOW copy source code (changes frequently, but layer below is cached)
COPY src/ ./src/
COPY tests/ ./tests/

CMD ["python", "-m", "src.train"]

Multi-Stage Build for Production ML Images

# multi-stage.Dockerfile
# Stage 1: Build dependencies (including C extensions)
FROM python:3.11-slim AS builder

RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential gcc g++ \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY pyproject.toml requirements.lock ./

# Install into a prefix directory
RUN pip install --no-cache-dir --prefix=/install -r requirements.lock

# Stage 2: Runtime image (no build tools, smaller image)
FROM python:3.11-slim AS runtime

# Copy only the installed packages from builder
COPY --from=builder /install /usr/local

WORKDIR /app
COPY src/ ./src/

# Verify the environment works
RUN python -c "import torch; print(torch.__version__)"

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

CMD ["python", "-m", "src.serve"]

GPU-Enabled Docker Base Images

For training and inference with CUDA:

# GPU training image
FROM nvidia/cuda:12.2.0-cudnn8-devel-ubuntu22.04

# Install Python
RUN apt-get update && apt-get install -y python3.11 python3-pip \
&& rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3.11 /usr/bin/python

WORKDIR /app

# Install PyTorch with correct CUDA version (CRITICAL)
# Never just "pip install torch" - it may install the wrong CUDA build
RUN pip install torch==2.1.0 torchvision==0.16.0 \
--index-url https://download.pytorch.org/whl/cu121

COPY pyproject.toml requirements.lock ./
RUN pip install --no-cache-dir -r requirements.lock

COPY src/ ./src/
CMD ["python", "-m", "src.train"]

conda vs pip Ecosystem

conda and pip are complementary but distinct:

Featurepipconda
Package typesPython-onlyPython + R + C/C++ libraries + CUDA
Package sourcePyPIAnaconda/conda-forge channels
Environment managementvenv (separate tool)Built-in conda env
Dependency resolutionPubGrub (modern pip)SAT solver
SpeedFast (uv: very fast)Slower for large installs
Non-Python depsSystem-level onlyBundles C libraries

conda's key advantage for ML is that it can install CUDA toolkit, cuDNN, and NCCL alongside Python packages, ensuring version compatibility without requiring a system CUDA installation:

# environment.yml - conda environment specification
name: ml-training
channels:
- pytorch
- nvidia
- conda-forge
- defaults
dependencies:
- python=3.11
- cuda-toolkit=12.2 # installs CUDA without system-level install
- cudnn=8.9
- pip:
- torch==2.1.0
- torchvision==0.16.0
- numpy>=1.24
- pydantic>=2.0
# Create from environment.yml
conda env create -f environment.yml

# Activate
conda activate ml-training

# Export current environment (with exact versions)
conda env export > environment-lock.yml

# Update packages
conda env update -f environment.yml --prune

GPU Package Management - The CUDA Compatibility Matrix

This is where most ML environment problems originate. Every layer must be compatible:

The Compatibility Matrix

CUDA Toolkit | Minimum Driver Version | PyTorch Build Tag
11.8 | 520.61 | cu118
12.1 | 530.30 | cu121
12.2 | 535.54 | cu122 (limited)
12.4 | 550.54 | cu124

GPU Compute Capability requirements:
Volta (V100): 7.0 - supports CUDA 9.0 through 12.x
Turing (T4): 7.5 - supports CUDA 10.0 through 12.x
Ampere (A100): 8.0 - supports CUDA 11.0 through 12.x
Ada (RTX 4090): 8.9 - supports CUDA 11.8 through 12.x
Hopper (H100): 9.0 - requires CUDA 11.8 minimum

CUDA Compatibility Checker

# cuda_compat_check.py
"""Check that the CUDA stack is internally consistent."""
import subprocess
import sys

def check_cuda_compatibility():
issues = []

# Check nvidia-smi (driver version)
try:
result = subprocess.run(
["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
capture_output=True, text=True, check=True
)
driver_version = result.stdout.strip().split("\n")[0]
print(f"NVIDIA Driver: {driver_version}")
except (subprocess.CalledProcessError, FileNotFoundError):
issues.append("nvidia-smi not found - is a GPU available?")
driver_version = None

# Check CUDA toolkit
try:
result = subprocess.run(
["nvcc", "--version"],
capture_output=True, text=True, check=True
)
# Extract version from: "release 12.2, V12.2.140"
line = [l for l in result.stdout.split("\n") if "release" in l][0]
cuda_version = line.split("release")[1].split(",")[0].strip()
print(f"CUDA Toolkit: {cuda_version}")
except (subprocess.CalledProcessError, FileNotFoundError, IndexError):
print("nvcc not found (OK for runtime-only environments)")
cuda_version = None

# Check PyTorch CUDA
try:
import torch
print(f"PyTorch: {torch.__version__}")
print(f"PyTorch CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"PyTorch CUDA version: {torch.version.cuda}")
print(f"cuDNN version: {torch.backends.cudnn.version()}")
device_count = torch.cuda.device_count()
for i in range(device_count):
props = torch.cuda.get_device_properties(i)
print(f"GPU {i}: {props.name}")
print(f" Compute capability: {props.major}.{props.minor}")
print(f" Memory: {props.total_memory / 1024**3:.1f} GB")
except ImportError:
issues.append("torch not installed")

if issues:
print("\nISSUES FOUND:")
for issue in issues:
print(f" - {issue}")
return False

print("\nCUDA stack appears consistent.")
return True

if __name__ == "__main__":
ok = check_cuda_compatibility()
sys.exit(0 if ok else 1)

Installing PyTorch with the Correct CUDA Build

# WRONG - installs latest PyTorch which may not match your CUDA version
pip install torch

# RIGHT - specify the CUDA build explicitly
# For CUDA 11.8:
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121

# For CPU only (no CUDA):
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu

# In requirements.txt - use explicit index URL
# --index-url https://download.pytorch.org/whl/cu121
# torch==2.1.0
# torchvision==0.16.0

# In pyproject.toml with uv:
# [tool.uv.sources]
# torch = { url = "https://download.pytorch.org/whl/cu121/torch-2.1.0+cu121-cp311-cp311-linux_x86_64.whl" }

ML Project Packaging Best Practices

Project Structure

my-ml-project/
├── pyproject.toml # all project metadata
├── uv.lock # exact dependency versions (commit this)
├── README.md
├── src/
│ └── my_ml_project/
│ ├── __init__.py
│ ├── models/
│ │ ├── __init__.py
│ │ └── transformer.py
│ ├── data/
│ │ ├── __init__.py
│ │ └── dataset.py
│ └── training/
│ ├── __init__.py
│ └── trainer.py
├── tests/
│ ├── conftest.py
│ ├── test_models.py
│ └── test_data.py
├── scripts/
│ └── train.py # entry point scripts
├── docker/
│ ├── Dockerfile.train
│ └── Dockerfile.serve
└── .github/
└── workflows/
├── ci.yml
└── publish.yml

The src Layout Advantage

The src/ layout prevents accidentally importing from the local directory instead of the installed package. When you run python from the project root, Python adds . to sys.path. Without src/, import my_ml_project finds the local directory even if the package is not installed. With src/, it must be installed (even pip install -e . works) and the import finds the installed version.

Environment Variable Management

# config.py - never hardcode secrets or environment-specific config
from pydantic_settings import BaseSettings, SettingsConfigDict
from typing import Literal

class Settings(BaseSettings):
# Read from environment variables (case-insensitive)
database_url: str
redis_url: str = "redis://localhost:6379"
model_checkpoint_dir: str = "./checkpoints"
log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
cuda_device: int = 0

# Load from .env file in development
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
case_sensitive=False,
)

# Usage
settings = Settings()
print(settings.database_url) # reads DATABASE_URL from environment
# .env (never commit this - add to .gitignore)
DATABASE_URL=postgresql://user:password@localhost/mydb
REDIS_URL=redis://localhost:6379
MODEL_CHECKPOINT_DIR=/mnt/checkpoints
LOG_LEVEL=DEBUG

Complete pyproject.toml for a Real ML Project

# pyproject.toml - production ML library
[build-system]
requires = ["setuptools>=68", "wheel", "Cython>=3.0", "numpy>=1.24"]
build-backend = "setuptools.backends.legacy:build"

[project]
name = "fast-embeddings"
version = "1.2.0"
description = "High-performance text embedding library with CUDA and Cython acceleration"
readme = "README.md"
license = { text = "Apache-2.0" }
requires-python = ">=3.10"
authors = [
{ name = "ML Team", email = "ml-team@example.com" }
]
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: Apache Software License",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]

dependencies = [
"numpy>=1.24,<3.0",
"scipy>=1.10",
"pydantic>=2.0",
"pydantic-settings>=2.0",
"huggingface-hub>=0.20",
"tokenizers>=0.15",
]

[project.optional-dependencies]
gpu = [
"torch>=2.1.0",
"torchvision>=0.16.0",
]
cpu = [
"torch>=2.1.0", # CPU-only torch installed separately
]
dev = [
"pytest>=7.4",
"pytest-cov>=4.0",
"mypy>=1.8",
"ruff>=0.3",
"pre-commit>=3.6",
"jaxtyping>=0.2",
"beartype>=0.17",
]

[project.scripts]
fast-embed = "fast_embeddings.cli:main"
fast-embed-server = "fast_embeddings.server:run"

[project.urls]
Repository = "https://github.com/example/fast-embeddings"
Documentation = "https://docs.example.com/fast-embeddings"

# Tool configurations
[tool.setuptools.packages.find]
where = ["src"]

[tool.ruff]
line-length = 100
target-version = "py310"
[tool.ruff.lint]
select = ["E", "F", "I", "N", "UP", "B", "ANN"]
ignore = ["ANN101", "ANN102"]

[tool.mypy]
python_version = "3.11"
strict = true
[mypy-numpy.*]
ignore_missing_imports = true
[mypy-torch.*]
ignore_missing_imports = true

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --cov=fast_embeddings --cov-report=term-missing"
markers = [
"gpu: marks tests that require a GPU (deselect with -m 'not gpu')",
"slow: marks tests that take more than 5 seconds",
]

[tool.uv]
# Use PyTorch's CUDA index for GPU dependencies
[tool.uv.sources]
torch = { index = "pytorch-cu121" }
torchvision = { index = "pytorch-cu121" }

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true # only use this index for packages that request it

Production Engineering Notes

Always Commit Lock Files

# .gitignore - what to ignore
.venv/
__pycache__/
*.egg-info/
dist/
build/
*.so
*.pyc
.env

# What to ALWAYS commit:
# uv.lock (or poetry.lock or requirements.lock)
# pyproject.toml
# environment.yml (if using conda)

Lock files belong in version control. The common objection is "they cause merge conflicts." The correct response is: merge conflicts in a lock file force you to explicitly reconcile dependency changes. Silent dependency drift is worse than a merge conflict.

Reproducible Builds in CI

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "0.5.x"
enable-cache: true # cache uv's package cache across runs

- name: Set up Python
run: uv python install 3.11

- name: Install dependencies
run: uv sync --frozen # --frozen: fail if uv.lock is out of date

- name: Run type checker
run: uv run mypy --strict src/

- name: Run linter
run: uv run ruff check src/

- name: Run tests
run: uv run pytest tests/ -m "not gpu" # skip GPU tests in CPU CI

test-gpu:
runs-on: [self-hosted, gpu] # runs on your GPU runners

steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
- run: uv sync --frozen
- run: uv run pytest tests/ -m gpu

Handling Transitive Dependency Security Vulnerabilities

# Audit installed packages for known vulnerabilities
pip install pip-audit
pip-audit

# Or with uv
uv pip audit

# Output shows:
# Found 1 vulnerability in 1 package
# Name Version ID Fix Versions
# ------- ------- ------------------- ----------------
# Pillow 9.2.0 GHSA-56pw-mpj4-fxjw 9.3.0

# Add to CI to fail on vulnerabilities:
# uv pip audit --desc on

Common Mistakes

:::danger Not Pinning the CUDA Build of PyTorch

# WRONG - installs whichever PyTorch build pip thinks is latest
pip install torch

# On a CUDA 11.8 system, this may install a CUDA 12.x build
# Your custom C extensions compiled against CUDA 11.8 headers will segfault

# RIGHT - always specify the CUDA build index
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu118

# In Dockerfile, this is mandatory - never allow torch to be installed
# without explicitly specifying the CUDA version

This is the bug from the opening scenario. It takes minutes to prevent and days to debug. :::

:::danger Committing .env Files with Credentials

# Add to .gitignore BEFORE creating .env
echo ".env" >> .gitignore
echo "*.env" >> .gitignore
git add .gitignore
git commit -m "add env file to gitignore"

# THEN create .env
echo "DATABASE_URL=postgresql://..." > .env

# If you have already committed a .env file with credentials:
git rm --cached .env
git commit -m "remove env file from tracking"
# Rotate all credentials that were exposed
# (assume they are compromised - anyone who cloned the repo has them)

:::

:::warning Not Using the src Layout in Shared Libraries

# WRONG structure - can import local directory without installing
my-project/
├── my_package/ # importable from anywhere in the project root
│ └── module.py
└── tests/
└── test_module.py # "import my_package" finds the local dir

# RIGHT structure - must install the package to import it
my-project/
├── src/
│ └── my_package/ # not importable from root without install
│ └── module.py
└── tests/
└── test_module.py # "import my_package" requires "pip install -e ."

The src/ layout catches a whole class of "works on my machine" bugs where tests pass locally (because the local directory is on the Python path) but fail in CI (where only the installed package is available). :::

:::warning Mixing pip and conda Packages Without Care

# DANGEROUS - mixing can cause library conflicts
conda activate myenv
conda install numpy scipy # installs its own libopenblas
pip install torch # may install its own libopenblas

# When torch and scipy both load, they may link to different libopenblas
# versions, causing subtle numerical errors or crashes

# SAFER - pick one manager for a given package type
# Use conda for system-level packages (CUDA, MKL, OpenBLAS)
# Use pip for Python packages that do not have native dependencies
# Or: use conda for everything via conda-forge channel

:::


Interview Q&A

Q: What is the difference between PEP 517 and PEP 518, and why do both matter for pyproject.toml?

A: PEP 518 (2016) solved the bootstrapping problem: before you can build a package, you need to install the tools to build it, but you cannot install those tools without knowing what they are. PEP 518 introduced the [build-system] table in pyproject.toml with a requires list - these are the packages pip installs into an isolated build environment before running the build. This meant you could declare requires = ["Cython>=3.0", "numpy"] and pip would install those before attempting to compile your Cython extensions. PEP 517 (also 2016) defined the interface between the build frontend (pip) and the build backend (setuptools, flit, poetry-core). The build-backend key specifies which tool actually performs the build. This decoupling enabled tools other than setuptools to become build backends - poetry-core, flit, hatchling, and meson-python all implement the PEP 517 interface. Together, these PEPs enabled the modern Python packaging ecosystem where you can replace setuptools with a better build backend without changing how pip invokes the build.

Q: Why are lock files essential for reproducibility, and what is the difference between a requirements.txt and a lock file?

A: A requirements.txt with constraints like torch>=2.0 specifies a range of acceptable versions. On Monday it resolves to torch 2.1.0. On Tuesday, torch 2.2.0 is released. Wednesday's CI run resolves to 2.2.0. The test suite breaks because 2.2.0 has a breaking change in an API you use. The builds were not reproducible. A lock file records exact versions and cryptographic hashes: torch==2.1.0 --hash=sha256:abc123.... Subsequent installs get exactly torch 2.1.0 with exactly those bytes - not 2.1.1, not a hypothetical malicious 2.1.0 that someone uploaded after compromising the author's account. The hash verification catches supply chain attacks. For ML training specifically, reproducibility goes beyond convenience: you need to know whether a performance regression between two runs is due to code changes or dependency changes. With a lock file, you know every dependency is identical, so the comparison is clean.

Q: Explain Docker layer caching for Python dependencies and why the order of COPY and RUN commands matters.

A: Docker builds images as a stack of layers. Each instruction creates a new layer, and layers are cached - if the instruction and all preceding layers are unchanged, Docker reuses the cached layer without re-running the instruction. The key insight for Python dependencies is that requirements.txt or pyproject.toml changes far less frequently than source code. If you copy all source files first and then install dependencies, Docker invalidates the layer cache on every source change, re-installing all dependencies each time. The correct pattern is: copy only the dependency specification files first, run the install (which gets cached because the specification did not change), then copy the source code. For a project with 100 dependencies, this reduces build time from minutes (reinstall everything) to seconds (copy changed source files) on every code change. The same principle applies to system packages: install system dependencies before Python dependencies, because system packages change even less frequently.

Q: What is the CUDA compatibility matrix, and what happens when there is a mismatch between layers?

A: The CUDA stack has five compatibility layers, each dependent on the layer below it. GPU hardware determines the compute capability (e.g., A100 is 8.0). The NVIDIA driver supports GPUs with specific compute capabilities and provides a maximum CUDA runtime version. The CUDA toolkit (nvcc, headers, runtime library) must be compatible with the driver version. cuDNN must be built against a specific CUDA toolkit version. PyTorch ships different binary builds for each CUDA version (cu118, cu121, cu124) and must match the CUDA runtime available. Custom C extensions (like Cython code using CUDA) are compiled against specific CUDA headers and must match the runtime. Mismatches manifest in different ways depending on where they occur: driver-toolkit mismatch typically fails during import with a clear error. PyTorch-CUDA toolkit mismatch may fail silently (wrong build loaded) and only error during the first CUDA operation. Custom extension-CUDA mismatch often produces a segmentation fault in native code with no Python traceback. The preventive solution is always: specify PyTorch with an explicit CUDA index URL, use Docker base images from nvidia/cuda with explicit versions, and test the environment with the cuda_compat_check script before deploying to production.

Q: What is PubGrub and how does it improve dependency resolution error messages?

A: PubGrub is a dependency resolution algorithm developed by Natalie Weizenbaum (of the Dart team) in 2018. The core problem it solves is version SAT (satisfiability): given a set of packages with version constraints, find a set of versions satisfying all constraints simultaneously. Older resolvers (pip before 2020, early npm) used backtracking search: try a version, if it conflicts, backtrack and try another. This produced resolution failures with unhelpful messages like "cannot install package X because Y requires Z>=2.0 which conflicts with A's requirement for Z<1.5" with no explanation of why those particular versions were chosen. PubGrub uses conflict-driven clause learning, borrowing from modern SAT solvers. When it detects a conflict, it analyzes which combinations of version choices led to it, records the derived constraint (a "cause"), and uses that knowledge to prune the search more aggressively. More importantly, the chain of causes provides a human-readable explanation: "package A 3.0 requires package B >=2.0. Package C 1.5 requires package B <2.0. Therefore A 3.0 and C 1.5 cannot coexist." Poetry adopted PubGrub and its error messages are significantly more actionable than pip's historical failures.

Q: What is the advantage of the src layout over putting the package directory at the project root?

A: The src/ layout enforces that you must install the package (even in editable mode with pip install -e .) before you can import it. Without src/, when you run Python from the project root, Python adds the current directory to sys.path, making the package directory importable directly. This creates a subtle trap: tests that pass locally may fail in CI because locally you are importing from the working directory (which includes uncommitted changes and possibly missing build artifacts like compiled Cython extensions), while CI imports from the installed package. The src/ layout makes these two states identical - both the local developer and CI import from the installed package. A second benefit: if your package has C extensions that need to be compiled, the src/ layout forces you to run the build step before testing, catching build failures early rather than running tests against a partially-installed package. A third benefit: it prevents accidentally shipping test code or scripts in the package because find_packages() and find_packages(where="src") behave predictably.


Summary

Python packaging has evolved from a scattered collection of competing tools into a relatively coherent standard: pyproject.toml as the single source of truth for project metadata, uv or poetry for dependency management, lock files for reproducibility, cibuildwheel for cross-platform wheel building, and explicit CUDA version pinning for GPU-dependent projects.

The investment in packaging infrastructure pays returns on every deployment, every CI run, and every time a new engineer joins the team. "It works on my machine" is not an acceptable answer when you have pyproject.toml, a lock file, and a Docker image that builds identically everywhere.

The four-day debugging saga from the opening scenario was not bad luck. It was the predictable consequence of treating environment setup as an afterthought. Modern Python packaging tools have made reproducibility the default - but you have to use them.

© 2026 EngineersOfAI. All rights reserved.