Garbage Collection Algorithms
How Python's reference counting and generational garbage collector work, why GC pauses hurt ML serving latency, and how to tune or disable GC for performance-critical workloads.
How Python's reference counting and generational garbage collector work, why GC pauses hurt ML serving latency, and how to tune or disable GC for performance-critical workloads.
Learn how stack frames, heap allocation, and Python's memory model work under the hood - from C struct padding to pymalloc arenas, with production debugging techniques.
Master the memory math behind training and serving large language models - from mixed precision and gradient checkpointing to ZeRO optimizer stages, KV cache management, and PagedAttention.
How glibc malloc, jemalloc, tcmalloc, and PyTorch's CUDA caching allocator work - with production techniques for eliminating memory fragmentation in ML training and serving.
Hardware memory models, memory barriers, atomic operations, lock-free data structures, and how memory ordering affects concurrent ML data pipelines and distributed training implementations.
A systematic toolkit for finding and fixing memory leaks in Python ML systems - from tracemalloc snapshots to GPU memory debugging, DataLoader leaks, and long-running service monitoring.
Understand memory safety bugs in C/C++, how Rust's ownership model eliminates them at compile time, and why Rust is becoming the language of choice for high-performance ML infrastructure components.
Stack and heap allocation, Python memory model, GPU memory patterns, memory profiling, and zero-copy data transfer - debugging OOM errors and building memory-efficient pipelines.
How to eliminate unnecessary memory copies in ML data pipelines - from sendfile() and mmap() to NumPy views, PyTorch pinned memory, and Apache Arrow Flight for zero-copy data serving.