Module 15 - Cost Management for ML

You cannot optimize what you cannot measure. And most ML teams cannot tell you what a single prediction costs.

What This Module Covers

Machine learning systems are expensive to build and operate. Training runs cost thousands of dollars. Inference at scale costs millions per year. Storage for experiments, artifacts, and feature data compounds quietly. Most ML teams have no clear picture of where the money goes.

This module teaches cost management as an engineering discipline: how to measure ML costs precisely, how to reduce them systematically without sacrificing model quality, and how to create organizational structures that make ML teams accountable for their spending.

Module Map

Lessons at a Glance

#	Lesson	Core Question
01	ML Infrastructure Cost Model	How much does our recommendation system cost per request?
02	Training Cost Optimization	How do we reduce a $50K training run to$ 18K without losing accuracy?
03	Inference Cost Optimization	How do we reduce serving cost from $0.08 to$ 0.02 per request?
04	Cloud FinOps for ML	How do we fix a quarterly cloud bill 3× over budget in 4 weeks?
05	Build vs. Buy Economics	Is self-hosting MLflow really cheaper than paying for W&B?
06	Cost Attribution and Accountability	Who is responsible for the $400K/year compute bill?

Key Concepts

Unit economics: Cost expressed per unit of business value - cost per prediction, cost per user, cost per training run. These metrics connect engineering decisions to business outcomes.

FinOps: Financial Operations - the practice of bringing financial accountability to cloud spending. In ML, this means tagging resources by model, team, and use case so that costs are attributable.

Compute-optimal training (Chinchilla): The insight from DeepMind (2022) that most large models are under-trained relative to their parameter count. Training smaller models longer is often more cost-effective than training larger models briefly.

Reserved instances vs. on-demand vs. spot: Three cloud pricing tiers with very different cost profiles. Training workloads that can tolerate interruption benefit from spot pricing (60–90% discount). Always-on serving requires reserved capacity.

Chargeback vs. showback: Showback means showing teams what they cost without billing them. Chargeback means actually billing teams for their consumption. Both change behavior, but chargeback changes it more.

Why Cost Management Is an Engineering Problem

ML cost management is not a finance problem dressed up in engineering clothes. The decisions that drive ML costs are made by engineers:

Which model architecture to use (100M parameters vs. 1B parameters)
Whether to use spot instances with checkpoint-and-restart
Whether to apply quantization to reduce inference compute
Which features to compute in real-time vs. batch
Whether to cache model outputs for repeated queries
Which experiments to run and for how long

Finance teams cannot make these decisions. They can report costs. Engineers can reduce them. The ML engineer who understands cost levers has a significant advantage: they can frame infrastructure decisions as business cases rather than technical preferences.

Prerequisites

Module 11 (Model Training at Scale) - distributed training, mixed precision
Module 12 (Model Serving) - inference architecture, autoscaling
Module 13 (MLOps Platforms) - experiment tracking, artifact management

What This Module Covers​

Module Map​

Lessons at a Glance​

Key Concepts​

Why Cost Management Is an Engineering Problem​

Prerequisites​