Module 05: CI/CD for ML
Why CI/CD for ML Is Different
Software CI/CD is solved. You push code, tests run, a binary deploys. Pass or fail is deterministic. ML is fundamentally different: your "artifact" is a trained model whose quality depends on data, hyperparameters, and randomness - none of which standard unit tests can catch. A broken model can pass every lint check, type check, and unit test and still ship predictions that are confidently wrong.
This module teaches you to build CI/CD pipelines that actually catch ML-specific failures: models that regress on demographic subgroups, pipelines that silently drop training rows, retraining jobs that never fire when data drifts. You will wire together GitHub Actions, GitLab CI, evaluation gates, and automated retraining into a system that earns the trust of production.
Module Map
Learning Objectives
By the end of this module you will be able to:
- Explain the dual CI problem in ML and why standard software pipelines are insufficient
- Build a practical ML test suite covering units, integrations, and model validation
- Write GitHub Actions workflows with conditional training triggers and GPU runner support
- Configure GitLab CI DAG pipelines for end-to-end ML lifecycle management
- Design multi-metric evaluation gates with statistical significance checks
- Implement trigger-based automated retraining with human-in-the-loop approval steps
- Architect continuous training systems that safely update models every few hours
Prerequisites
- Familiarity with Git branching and pull requests
- Python and basic ML model training (any framework)
- Basic Linux command line
- Module 04 (experiment tracking) recommended but not required
Lessons
| # | Lesson | Core Problem Solved |
|---|---|---|
| 01 | CI/CD for ML vs Software | Standard CI passes broken models |
| 02 | Testing ML Code | Building ML test suite from zero |
| 03 | GitHub Actions for ML | Conditional training on data/code change |
| 04 | GitLab CI for ML | Enterprise ML pipeline data-to-prod |
| 05 | Model Evaluation Gates | Gate failure on demographic subgroup |
| 06 | Automated Retraining Pipelines | Fraud model retrain every 48 hours |
| 07 | Continuous Training | News ranking model fresh every 4 hours |
Estimated Time
7–9 hours total. Each lesson is self-contained; follow the module sequentially or jump to the lesson most relevant to your current pain point.
