9 docs tagged with "memory-management"

Garbage Collection Algorithms

How Python's reference counting and generational garbage collector work, why GC pauses hurt ML serving latency, and how to tune or disable GC for performance-critical workloads.

Heap and Stack Memory

Learn how stack frames, heap allocation, and Python's memory model work under the hood - from C struct padding to pymalloc arenas, with production debugging techniques.

Large-Scale Memory Optimization

Master the memory math behind training and serving large language models - from mixed precision and gradient checkpointing to ZeRO optimizer stages, KV cache management, and PagedAttention.

Memory Allocators for ML

How glibc malloc, jemalloc, tcmalloc, and PyTorch's CUDA caching allocator work - with production techniques for eliminating memory fragmentation in ML training and serving.

Memory Models and Concurrency

Hardware memory models, memory barriers, atomic operations, lock-free data structures, and how memory ordering affects concurrent ML data pipelines and distributed training implementations.

Memory Profiling and Debugging

A systematic toolkit for finding and fixing memory leaks in Python ML systems - from tracemalloc snapshots to GPU memory debugging, DataLoader leaks, and long-running service monitoring.

Memory Safety and Rust

Understand memory safety bugs in C/C++, how Rust's ownership model eliminates them at compile time, and why Rust is becoming the language of choice for high-performance ML infrastructure components.

Module 4: Memory Management for ML

Stack and heap allocation, Python memory model, GPU memory patterns, memory profiling, and zero-copy data transfer - debugging OOM errors and building memory-efficient pipelines.

Zero-Copy and Data Transfer

How to eliminate unnecessary memory copies in ML data pipelines - from sendfile() and mmap() to NumPy views, PyTorch pinned memory, and Apache Arrow Flight for zero-copy data serving.