Containers and Namespaces
How Linux namespaces, cgroups, and overlay filesystems power container isolation for multi-tenant ML serving, GPU workloads, and reproducible training environments.
How Linux namespaces, cgroups, and overlay filesystems power container isolation for multi-tenant ML serving, GPU workloads, and reproducible training environments.
Master Linux file systems for ML workloads - VFS, ext4/XFS, page cache, direct I/O, mmap, io_uring, and how to tune I/O for maximum training data throughput and checkpoint speed.
Kernel bypass networking for ML clusters - DPDK architecture, RDMA and InfiniBand for GPU-to-GPU communication, NCCL's bypass path, io_uring, eBPF, and when these techniques matter for AllReduce latency.
Systematic Linux performance tuning for ML workloads - sysctl parameters, CPU governors, NUMA balancing, transparent huge pages, IRQ affinity, NIC tuning, and grub options that matter for training throughput and inference latency.
Understand Linux CFS scheduler, nice values, CPU affinity, real-time scheduling, cgroups, NUMA, and how Kubernetes CPU throttling destroys ML training throughput - with concrete fixes.
Learn how processes, threads, and coroutines work at the OS level, and how to choose the right concurrency model for ML workloads - data loading, inference, and async API calls.
Unix signals, graceful shutdown patterns, shared memory, pipes, Unix domain sockets, and ZeroMQ for building reliable multi-process ML training and serving systems.
Understand virtual memory layout, page tables, TLB, huge pages, and page faults - and how these OS mechanisms directly affect PyTorch training, large model loading, and ML dataset memory mapping.