Module 4: Memory Management for ML
Out-of-memory errors are the most common infrastructure problem in ML engineering. A training job that ran fine at batch size 32 OOMs at batch size 64. An inference server that handles 10 concurrent requests fine crashes at 15. A data pipeline that processes 100k samples fine hangs on 1M. Every one of these problems traces to memory management - how memory is allocated, when it is freed, and whether the allocation patterns match the hardware's capacity.
This module gives you the tools to understand memory at every layer: the OS virtual memory system, Python's reference counting garbage collector, PyTorch's CUDA memory caching allocator, and GPU HBM constraints.
Memory Layers in an ML System
Lessons in This Module
| # | Lesson | Key Concept |
|---|---|---|
| 1 | Stack vs Heap Allocation | Memory layout, allocation cost, stack frames |
| 2 | Python Memory Model | Reference counting, cyclic GC, memory views |
| 3 | Reference Counting and GC | Python GC internals, weakref, memory leaks |
| 4 | Memory Leaks in ML Training | Accumulating tensors, detach, graph retention |
| 5 | GPU Memory Allocation Patterns | CUDA allocator, caching, fragmentation |
| 6 | Memory Profiling Tools | torch.cuda.memory_summary, memory_profiler, valgrind |
| 7 | Zero-Copy Data Transfer | Pinned memory, DMA transfers, avoiding copies |
| 8 | Memory-Efficient Training Strategies | Gradient checkpointing, activation offloading, mixed precision |
Key Concepts You Will Master
- CUDA memory caching allocator - why
torch.cuda.empty_cache()does not always fix OOM errors - Python reference cycle detection - finding and fixing memory leaks in training loops
- Gradient checkpointing - the memory-compute tradeoff that enables training larger models
- Pinned memory - using page-locked host memory to accelerate GPU data transfer
- Fragmentation - why GPU memory fragmentation causes OOM at 60% utilization and how to avoid it
Prerequisites
- Computer Architecture
- Operating Systems for ML
- Python proficiency, basic PyTorch
© 2026 EngineersOfAI. All rights reserved.
