Module 5: Memory Systems for AI
HBM, DRAM, cache hierarchies, KV cache management, PagedAttention, and quantization as memory compression - understanding memory is understanding why LLM inference costs what it costs.
HBM, DRAM, cache hierarchies, KV cache management, PagedAttention, and quantization as memory compression - understanding memory is understanding why LLM inference costs what it costs.