Skip to main content

Module 7 - Inference Hardware

Inference-optimized hardware, quantization tradeoffs, KV cache management, speculative decoding, TensorRT, batching, and edge inference.