Skip to main content

Module 15 - Cost Management for ML

You cannot optimize what you cannot measure. And most ML teams cannot tell you what a single prediction costs.

What This Module Covers

Machine learning systems are expensive to build and operate. Training runs cost thousands of dollars. Inference at scale costs millions per year. Storage for experiments, artifacts, and feature data compounds quietly. Most ML teams have no clear picture of where the money goes.

This module teaches cost management as an engineering discipline: how to measure ML costs precisely, how to reduce them systematically without sacrificing model quality, and how to create organizational structures that make ML teams accountable for their spending.


Module Map


Lessons at a Glance

#LessonCore Question
01ML Infrastructure Cost ModelHow much does our recommendation system cost per request?
02Training Cost OptimizationHow do we reduce a 50Ktrainingrunto50K training run to 18K without losing accuracy?
03Inference Cost OptimizationHow do we reduce serving cost from 0.08to0.08 to 0.02 per request?
04Cloud FinOps for MLHow do we fix a quarterly cloud bill 3× over budget in 4 weeks?
05Build vs. Buy EconomicsIs self-hosting MLflow really cheaper than paying for W&B?
06Cost Attribution and AccountabilityWho is responsible for the $400K/year compute bill?

Key Concepts

Unit economics: Cost expressed per unit of business value - cost per prediction, cost per user, cost per training run. These metrics connect engineering decisions to business outcomes.

FinOps: Financial Operations - the practice of bringing financial accountability to cloud spending. In ML, this means tagging resources by model, team, and use case so that costs are attributable.

Compute-optimal training (Chinchilla): The insight from DeepMind (2022) that most large models are under-trained relative to their parameter count. Training smaller models longer is often more cost-effective than training larger models briefly.

Reserved instances vs. on-demand vs. spot: Three cloud pricing tiers with very different cost profiles. Training workloads that can tolerate interruption benefit from spot pricing (60–90% discount). Always-on serving requires reserved capacity.

Chargeback vs. showback: Showback means showing teams what they cost without billing them. Chargeback means actually billing teams for their consumption. Both change behavior, but chargeback changes it more.


Why Cost Management Is an Engineering Problem

ML cost management is not a finance problem dressed up in engineering clothes. The decisions that drive ML costs are made by engineers:

  • Which model architecture to use (100M parameters vs. 1B parameters)
  • Whether to use spot instances with checkpoint-and-restart
  • Whether to apply quantization to reduce inference compute
  • Which features to compute in real-time vs. batch
  • Whether to cache model outputs for repeated queries
  • Which experiments to run and for how long

Finance teams cannot make these decisions. They can report costs. Engineers can reduce them. The ML engineer who understands cost levers has a significant advantage: they can frame infrastructure decisions as business cases rather than technical preferences.


Prerequisites

  • Module 11 (Model Training at Scale) - distributed training, mixed precision
  • Module 12 (Model Serving) - inference architecture, autoscaling
  • Module 13 (MLOps Platforms) - experiment tracking, artifact management
© 2026 EngineersOfAI. All rights reserved.