8 docs tagged with "compilers-and-runtimes"

Cython and C Extensions

Learn how Cython bridges Python and C to deliver C-level performance in Python projects, covering type declarations, typed memoryviews, OpenMP parallelism, and raw C extension modules.

Dependency Management and Packaging

Master Python packaging from pyproject.toml and uv to Docker layer caching, private registries, and the CUDA version compatibility matrix that determines whether your ML environment actually works.

How Python Works Internally

A deep dive into CPython's architecture - from source code to bytecode execution, the GIL, memory management, and the Python object model that every serious Python engineer should understand.

JIT Compilation and numba

Just-in-time compilation principles from first principles, numba's LLVM backend and type inference system, GPU kernels with numba CUDA, and when JIT compilation delivers real performance gains.

LLVM compiler infrastructure and MLIR multi-level IR for ML - how they power PyTorch, JAX, TensorFlow, Triton, and IREE, with SSA form, optimization passes, dialect design, and practical code generation for ML workloads.

Profiling Python and C Code

Master the complete profiling toolkit - cProfile, line_profiler, py-spy, Scalene, Valgrind, and PyTorch Profiler - to find and eliminate bottlenecks in Python and ML training code.

Static Analysis and Type Systems

Build type-safe ML codebases using Python type hints, mypy strict mode, pydantic v2 validation, Protocol types, jaxtyping tensor shape annotations, and ruff for fast linting.

torch.compile and XLA

Deep dive into PyTorch's torch.compile architecture - TorchDynamo graph capture, AOTAutograd, TorchInductor code generation, XLA for TPU/GPU, and when compiler-based optimization delivers real ML performance gains.