Module 5: Networking for Distributed AI
Distributed training means moving gradients between GPUs. Distributed inference means moving requests between services. At the scale modern AI systems operate, the network is often the bottleneck - not the compute.
When you scale from 8 GPUs to 64, the AllReduce communication cost can dominate total training time if your network is not correctly configured. When you serve a 70B model across 4 H100s with tensor parallelism, every token generation requires 3 inter-GPU communication rounds. Understanding networking lets you predict these costs, diagnose bottlenecks, and make architecture decisions that respect physical constraints.
Networking in the ML System
Lessons in This Module
| # | Lesson | Key Concept |
|---|---|---|
| 1 | TCP/IP Fundamentals for Engineers | IP routing, TCP flow control, latency vs bandwidth |
| 2 | RDMA - Remote Direct Memory Access | Bypassing the OS kernel, zero-copy GPU-to-GPU |
| 3 | Collective Operations and AllReduce | Ring-AllReduce, tree reduce, NCCL algorithms |
| 4 | Bandwidth, Latency, and Throughput | Little's Law, queuing theory basics for serving |
| 5 | gRPC for Model Serving | Protocol Buffers, streaming, bidirectional gRPC |
| 6 | Distributed File Systems for Training | NFS, Lustre, S3 - throughput requirements and limits |
| 7 | Network Bottlenecks in Distributed Training | Diagnosing communication overhead, profiling NCCL |
| 8 | Service Mesh for AI Microservices | Istio/Envoy for ML serving, observability, circuit breaking |
Key Concepts You Will Master
- AllReduce communication volume - calculating gradient communication overhead for any model size
- RDMA - how remote direct memory access eliminates CPU involvement in inter-GPU communication
- NCCL topology detection - how NCCL automatically discovers the best communication algorithm
- gRPC streaming - using server-side streaming for token-by-token LLM responses
- Network bottleneck identification - distinguishing compute-bound from communication-bound training
Prerequisites
- Computer Architecture
- Basic networking knowledge (IP addresses, ports, HTTP)
© 2026 EngineersOfAI. All rights reserved.
