9 docs tagged with "gpu-architecture"

Ampere, Hopper, and Ada Architectures

What changed across GPU generations for AI - A100 vs H100 vs H200 vs RTX 4090, NVLink bandwidth, transformer engine, FP8 support, and architecture selection for training and inference.

GPU vs CPU Architecture

Why GPUs dominate deep learning - SIMT execution model, throughput vs latency optimization, the fundamental design tradeoffs between CPU and GPU silicon.

Memory Hierarchy in GPUs

Registers, L1/L2 cache, shared memory, and HBM - GPU memory hierarchy latency numbers, bandwidth characteristics, and how to write code that uses each level effectively.

Module 1: GPU Architecture

How GPUs work at the silicon level - streaming multiprocessors, tensor cores, memory hierarchy, and the roofline model that explains every ML performance optimization.

PCIe and NVLink Interconnects

Host-to-device PCIe bandwidth, GPU-to-GPU NVLink and NVSwitch, the interconnect hierarchy in multi-GPU systems, and how interconnect bandwidth shapes model parallelism strategies.

Roofline Model and Bottleneck Analysis

Arithmetic intensity, roofline model construction, identifying compute vs memory-bound operations, and using the roofline to guide optimization decisions.

Selecting GPUs for Training vs Inference

H100 vs A100 vs L40S vs RTX 4090 vs A10G - a practical decision framework for matching GPU specifications to training and inference workload requirements.

Streaming Multiprocessors

The SM is the fundamental execution unit of every NVIDIA GPU - warp schedulers, register files, shared memory, occupancy, and how thread block configuration determines performance.

Tensor Cores and Mixed Precision

How tensor cores accelerate matrix multiply, BF16 vs FP16 vs FP8 vs TF32, mixed precision training implementation, and the performance impact of precision choices.