Module 3: Compilers and Runtimes for ML
torch.compile(model) can make your model 2-3x faster with one line of code. But when it does not work - when it falls back to eager mode, when it produces incorrect results, when the compilation takes longer than your training run - you have no idea why. That is the cost of treating the compiler as a black box.
This module opens the box. Not to make you write compilers, but to give you enough understanding to use torch.compile correctly, debug compilation failures, understand what XLA and TensorRT are actually doing, and make informed decisions about the compilation-performance tradeoff.
The Compilation Stack
Lessons in This Module
| # | Lesson | Key Concept |
|---|---|---|
| 1 | How Compilers Work | Parsing, IR, optimization passes, code generation |
| 2 | JIT Compilation in Python | CPython bytecode, numba, how JIT differs from AOT |
| 3 | MLIR - Multi-Level IR | Dialect system, lowering passes, why MLIR exists |
| 4 | XLA and JAX Compilation | XLA IR, fusion in XLA, how JAX uses XLA |
| 5 | torch.compile Internals | Dynamo, Inductor, compilation modes, guard failures |
| 6 | TensorRT and Inference Optimization | Graph optimization, layer fusion, precision calibration |
| 7 | ONNX and Cross-Framework Portability | ONNX format, opsets, ONNX Runtime optimization |
| 8 | Ahead-of-Time vs JIT for ML | Tradeoffs, when each approach is appropriate |
Key Concepts You Will Master
- Intermediate representations (IR) - how compilers represent computation before generating hardware code
- Operator fusion - the compiler technique that eliminates intermediate memory reads/writes
- Graph-mode vs eager-mode - the fundamental difference in how PyTorch and JAX execute operations
- torch.compile compilation modes - default, reduce-overhead, max-autotune - when to use each
- TensorRT calibration - using a calibration dataset to choose quantization scales for FP16/INT8 inference
- Guard failures - why torch.compile recompiles and how to avoid it
Prerequisites
- GPU Architecture helpful
- PyTorch proficiency
- Basic understanding of how compilers work (no prior experience required)
© 2026 EngineersOfAI. All rights reserved.
