Skip to main content

9 docs tagged with "memory-management"

View all tags

Garbage Collection Algorithms

How Python's reference counting and generational garbage collector work, why GC pauses hurt ML serving latency, and how to tune or disable GC for performance-critical workloads.

Heap and Stack Memory

Learn how stack frames, heap allocation, and Python's memory model work under the hood - from C struct padding to pymalloc arenas, with production debugging techniques.

Large-Scale Memory Optimization

Master the memory math behind training and serving large language models - from mixed precision and gradient checkpointing to ZeRO optimizer stages, KV cache management, and PagedAttention.

Memory Allocators for ML

How glibc malloc, jemalloc, tcmalloc, and PyTorch's CUDA caching allocator work - with production techniques for eliminating memory fragmentation in ML training and serving.

Memory Models and Concurrency

Hardware memory models, memory barriers, atomic operations, lock-free data structures, and how memory ordering affects concurrent ML data pipelines and distributed training implementations.

Memory Profiling and Debugging

A systematic toolkit for finding and fixing memory leaks in Python ML systems - from tracemalloc snapshots to GPU memory debugging, DataLoader leaks, and long-running service monitoring.

Memory Safety and Rust

Understand memory safety bugs in C/C++, how Rust's ownership model eliminates them at compile time, and why Rust is becoming the language of choice for high-performance ML infrastructure components.

Module 4: Memory Management for ML

Stack and heap allocation, Python memory model, GPU memory patterns, memory profiling, and zero-copy data transfer - debugging OOM errors and building memory-efficient pipelines.

Zero-Copy and Data Transfer

How to eliminate unnecessary memory copies in ML data pipelines - from sendfile() and mmap() to NumPy views, PyTorch pinned memory, and Apache Arrow Flight for zero-copy data serving.