Skip to main content

Module 7 - Inference and Optimization

Quantization, KV caching, speculative decoding, continuous batching, and serving infrastructure.