CPU Memory Architecture for ML
How CPU memory hierarchy - L1/L2/L3 caches, DRAM, and NUMA topology - shapes ML data pipelines, DataLoader performance, and large model loading strategies on multi-socket servers.
How CPU memory hierarchy - L1/L2/L3 caches, DRAM, and NUMA topology - shapes ML data pipelines, DataLoader performance, and large model loading strategies on multi-socket servers.
Complete GPU memory hierarchy - registers, L1/shared memory, L2 cache, and HBM - capacity, bandwidth, latency at each level, and how data flows through the hierarchy during kernel execution.
High Bandwidth Memory vs GDDR6X - how 3D stacking with Through-Silicon Vias enables HBM3 to deliver 3.35 TB/s on H100, why GDDR6X tops at 1 TB/s, the economics of each, and how memory bandwidth constrains LLM inference throughput.
Learn to apply the Roofline model to diagnose whether GPU kernels are memory-bound or compute-bound, calculate arithmetic intensity, and use roofline plots to guide real optimization decisions.
How to compute exact GPU memory requirements for LLM training and inference - model weights, optimizer states, activations, KV cache - and how to plan GPU cluster configurations for target models.
CPython's memory allocator layers, the pymalloc arena system, reference counting, cyclic GC generations, and how memory is actually freed.
Master Python's memory management at the engineering level - CPython reference counting (ob_refcnt), cyclic garbage collection, aliasing, weak references, del semantics, sys.getrefcount, memory leaks from closures and global caches, and real-world debugging strategies.
How agents store, retrieve, and manage knowledge across interactions - working memory, episodic memory, semantic memory, procedural memory, and cross-session persistence.
Master Python mutability at the engineering level - the object model (id, type, value), pass-by-object-reference, the mutable default argument anti-pattern, frozen dataclasses, += behavior differences, string concatenation performance, and designing reliable concurrent systems.
Understand PCIe bandwidth limitations for CPU-GPU data transfer, NVLink for high-speed GPU-to-GPU communication, NVSwitch topology in DGX systems, and how to design systems that avoid interconnect bottlenecks in multi-GPU AI training.
Master Python list internals at the engineering level - dynamic array architecture, CPython memory layout, over-allocation strategy, amortized O(1) append, and production-level list pitfalls. Better than W3Schools or GeeksforGeeks.
Object memory overhead, __slots__, generators, memory-mapped files, and GC tuning - reducing Python's memory footprint in production.
PyObject layout, type objects, reference counting, small integer cache, string interning, and the cost of Python's dynamic type system.
Master Python variables at the engineering level - name binding, object references, mutation vs reassignment, identity vs equality, copy semantics, and common traps. Better than anything on W3Schools or GeeksforGeeks.
Solve 12 Python variables in memory problems. Covers variables practice, memory model, is vs, mutable vs. Hints and solutions.
Master shallow and deep copy in Python at the engineering level - assignment aliasing, copy.copy() vs copy.deepcopy(), ASCII memory diagrams, the memo dict, circular references, custom __copy__/__deepcopy__ protocols, and production-ready patterns for defensive copying.
How storage IO bottlenecks GPU utilization in ML training, NVMe and distributed filesystem characteristics, data loading patterns with WebDataset and DALI, prefetching strategies, and designing checkpointing that does not stall your cluster.
Master Python tuple internals at the engineering level - CPython PyTupleObject layout, immutability vs constancy, hashability mechanics, named tuples, structural unpacking, and when to choose tuples over lists in production code.
How CUDA Unified Memory works under the hood, when it helps versus hurts performance, and how PyTorch's caching allocator and memory pools eliminate allocation overhead in production ML systems.
A deep engineering dive into how Python stores variables in memory — stack frames vs heap objects, reference counting, garbage collection, pointer arithmetic, and the complete object model that governs Python's behavior.