Module 15 - Cost Management for ML
You cannot optimize what you cannot measure. And most ML teams cannot tell you what a single prediction costs.
What This Module Covers
Machine learning systems are expensive to build and operate. Training runs cost thousands of dollars. Inference at scale costs millions per year. Storage for experiments, artifacts, and feature data compounds quietly. Most ML teams have no clear picture of where the money goes.
This module teaches cost management as an engineering discipline: how to measure ML costs precisely, how to reduce them systematically without sacrificing model quality, and how to create organizational structures that make ML teams accountable for their spending.
Module Map
Lessons at a Glance
| # | Lesson | Core Question |
|---|---|---|
| 01 | ML Infrastructure Cost Model | How much does our recommendation system cost per request? |
| 02 | Training Cost Optimization | How do we reduce a 18K without losing accuracy? |
| 03 | Inference Cost Optimization | How do we reduce serving cost from 0.02 per request? |
| 04 | Cloud FinOps for ML | How do we fix a quarterly cloud bill 3× over budget in 4 weeks? |
| 05 | Build vs. Buy Economics | Is self-hosting MLflow really cheaper than paying for W&B? |
| 06 | Cost Attribution and Accountability | Who is responsible for the $400K/year compute bill? |
Key Concepts
Unit economics: Cost expressed per unit of business value - cost per prediction, cost per user, cost per training run. These metrics connect engineering decisions to business outcomes.
FinOps: Financial Operations - the practice of bringing financial accountability to cloud spending. In ML, this means tagging resources by model, team, and use case so that costs are attributable.
Compute-optimal training (Chinchilla): The insight from DeepMind (2022) that most large models are under-trained relative to their parameter count. Training smaller models longer is often more cost-effective than training larger models briefly.
Reserved instances vs. on-demand vs. spot: Three cloud pricing tiers with very different cost profiles. Training workloads that can tolerate interruption benefit from spot pricing (60–90% discount). Always-on serving requires reserved capacity.
Chargeback vs. showback: Showback means showing teams what they cost without billing them. Chargeback means actually billing teams for their consumption. Both change behavior, but chargeback changes it more.
Why Cost Management Is an Engineering Problem
ML cost management is not a finance problem dressed up in engineering clothes. The decisions that drive ML costs are made by engineers:
- Which model architecture to use (100M parameters vs. 1B parameters)
- Whether to use spot instances with checkpoint-and-restart
- Whether to apply quantization to reduce inference compute
- Which features to compute in real-time vs. batch
- Whether to cache model outputs for repeated queries
- Which experiments to run and for how long
Finance teams cannot make these decisions. They can report costs. Engineers can reduce them. The ML engineer who understands cost levers has a significant advantage: they can frame infrastructure decisions as business cases rather than technical preferences.
Prerequisites
- Module 11 (Model Training at Scale) - distributed training, mixed precision
- Module 12 (Model Serving) - inference architecture, autoscaling
- Module 13 (MLOps Platforms) - experiment tracking, artifact management
