Hardware and Silicon for AI

Most ML engineers treat hardware as a black box. They pick a GPU instance type, watch their training script run, and accept whatever throughput they get. The engineers who treat hardware as a first-class concern achieve 2-10x better utilization, catch bottlenecks before they become blockers, and make better architectural decisions because they understand the physical constraints their code runs under.

This track closes that gap.

Why Hardware Knowledge Matters

When your training run is slow, the answer is somewhere in the hardware stack. When your inference costs are too high, the answer is almost always in memory - how you're using it, how much you need, how efficiently you're moving it. When you're choosing between H100s and TPUs for a workload, the right answer depends on understanding the memory bandwidth, interconnect topology, and compute characteristics of both.

The engineers who understand hardware make better decisions at every level:

They write model architectures that fit the hardware (not the other way around)
They choose the right quantization strategy because they understand what memory bandwidth actually costs
They know why FlashAttention is fast (it's not the math - it's the memory access pattern)
They debug OOM errors without guessing
They design distributed training topologies that do not bottleneck on the network

What This Track Covers

Seven modules covering the full hardware stack for AI:

Module	Topic	Key Skills
1	GPU Architecture	SMs, tensor cores, memory hierarchy, roofline model
2	CUDA Programming	Kernels, thread blocks, memory coalescing, profiling
3	Custom Silicon	TPUs, Trainium, Groq LPU, Cerebras, Gaudi
4	Kernel Optimization	Triton, FlashAttention, operator fusion, torch.compile
5	Memory Systems	HBM, KV cache, quantization, memory profiling
6	Distributed Training Hardware	NVLink, InfiniBand, AllReduce, fault tolerance
7	Inference Hardware	Cost-per-token, batching, edge hardware, serving stack

Who This Track Is For

ML Engineers who want to stop treating hardware as a black box and start using it as a lever.

AI Infrastructure Engineers building training clusters or inference serving systems.

Research Engineers implementing custom kernels and optimizing model architectures for specific hardware.

Senior Engineers making hardware procurement and architecture decisions.

Prerequisites

Comfortable with Python and PyTorch
Basic understanding of neural network training
Math for AI Track helpful but not required

The Payoff

Understanding hardware does not make you a hardware engineer. It makes you a better ML engineer - one who writes code that respects the machine it runs on, and understands why certain optimizations work when others do not.

Start with GPU Architecture if you are new to the hardware stack.

Start with CUDA Programming if you want to write your own kernels.

Start with Kernel Optimization if you are optimizing an existing model.

Why Hardware Knowledge Matters​

What This Track Covers​

Who This Track Is For​

Prerequisites​

The Payoff​

Why Hardware Knowledge Matters

What This Track Covers

Who This Track Is For

Prerequisites

The Payoff