Module 04: Statistics for ML
"Without data, you're just another person with an opinion. But without statistics, data is just noise."
The Production Reality
You've trained a new model. Your offline metrics look great - accuracy is up 1.2%, NDCG improved. But your engineering manager asks: "Is that improvement real, or just lucky random variation?" You push to production. Two weeks later: "Did this model actually lift conversion, or did we just happen to run the experiment during the holiday season?"
Every ML engineer eventually faces these questions. The answers live in statistics.
Statistics is not a collection of formulas to memorize. It is the formal language for reasoning under uncertainty - and ML systems are uncertainty machines. You train on noisy data, you evaluate on finite test sets, you deploy into a world that shifts. Statistical thinking lets you separate signal from noise at every stage of the ML lifecycle.
This module bridges probability theory (Module 03) and the full machine learning curriculum. By the end, you will have the statistical toolkit to:
- Design rigorous experiments that prove your model works
- Calculate how many samples you need to detect a real improvement
- Understand why your offline A/B test results often disagree with online results
- Communicate uncertainty in model performance to non-technical stakeholders
- Avoid the statistical traps that lead to shipping models that don't actually help users
Module Map
How Statistics Powers ML Engineering
1. Model Training
| Concept | Where it appears in ML |
|---|---|
| Maximum Likelihood Estimation (MLE) | Cross-entropy loss IS negative log-likelihood |
| Regularisation as MAP estimation | L2 penalty IS a Gaussian prior on weights |
| Bias-Variance tradeoff | Underfitting vs overfitting |
| Consistency of estimators | "More data always helps" - but how much? |
2. Model Evaluation
| Concept | Where it appears in ML |
|---|---|
| Confidence intervals | "Our model achieves 87.3% accuracy ± 0.4%" |
| Hypothesis testing | "Is model A significantly better than model B?" |
| Bootstrap resampling | Robust metric estimation on small test sets |
| Multiple testing correction | Comparing dozens of hyperparameter configurations |
3. Experimentation & Deployment
| Concept | Where it appears in ML |
|---|---|
| A/B testing (ANOVA) | Controlled online experiments |
| Statistical power | "How long should we run the experiment?" |
| Causal inference | "Did the model cause the improvement or did confounders?" |
| Effect size (Cohen's d) | Minimum detectable effect for business KPIs |
Lesson-by-Lesson Real-World Use Cases
| Lesson | Real-World ML Use Case |
|---|---|
| 01 Estimation Theory | Training neural networks (cross-entropy = MLE); Bayesian regularisation |
| 02 Hypothesis Testing | Model comparison tests; feature importance validation; detecting data drift |
| 03 Confidence Intervals | Reporting model performance with uncertainty bounds |
| 04 Bootstrap & Resampling | Evaluating variance in F1-score; k-fold cross-validation as resampling |
| 05 Regression Analysis | Linear models as ML foundation; understanding logistic regression deeply |
| 06 ANOVA & Experimental Design | A/B testing new model variants; hyperparameter ablation studies |
| 07 Causal Inference | Why offline recommendation metrics lie; online A/B as ground truth |
| 08 Statistical Power | Deciding sample size before launching an experiment |
Prerequisites
Before starting this module, you should be comfortable with:
- Module 01 - Linear Algebra: Matrix operations, eigendecomposition (for regression)
- Module 02 - Calculus: Derivatives, optimization (for MLE derivations)
- Module 03 - Probability Theory: Random variables, distributions, expectation, Bayes theorem
You do not need to have taken a formal statistics course. This module is self-contained from a statistics perspective, building everything from probability foundations.
:::note Required Python Libraries
# All code in this module uses:
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import t, norm, chi2, f
Install with: pip install numpy scipy statsmodels matplotlib
:::
Learning Objectives
By the end of this module, you will be able to:
Conceptual Understanding
- Explain why cross-entropy loss is equivalent to Maximum Likelihood Estimation
- Correctly interpret a p-value (and identify common misconceptions)
- Explain what a 95% confidence interval means - and what it does NOT mean
- Distinguish correlation from causation using the potential outcomes framework
Mathematical Skills
- Derive the MLE estimator for Gaussian and Bernoulli distributions
- Compute t-tests, chi-squared tests, and F-statistics by hand
- Construct bootstrap confidence intervals from scratch
- Calculate required sample size given power, effect size, and significance level
Engineering Skills
- Write production-quality A/B test analysis code in Python
- Choose the right statistical test for model comparison
- Apply multiple testing corrections when comparing many model variants
- Detect confounders in offline evaluation scenarios
Interview Readiness
- Answer "What is a p-value?" without triggering the incorrect "probability the null is true" trap
- Explain the bias-variance tradeoff in terms of estimation theory
- Design a sample size calculation for a new ML experiment
How Statistics Connects to the Rest of the Curriculum
Probability Theory (Module 03)
│
▼
Statistics for ML (Module 04) ◄──── This module
│
├──► Bayesian Statistics (Module 06)
│ └─ Priors, posteriors, MCMC
│
├──► Statistical Learning Theory (Module 07)
│ └─ PAC learning, VC dimension, generalisation bounds
│
└──► Information Theory (Module 05)
└─ Entropy, KL divergence, cross-entropy
Statistics is the connective tissue of the math curriculum. MLE from this module explains why cross-entropy loss works. Confidence intervals connect to PAC learning bounds. Hypothesis testing IS what you're doing every time you compare models.
The Three Core Questions of ML Statistics
Every statistical concept in this module answers one of three fundamental ML questions:
Question 1: What should I estimate? Estimation Theory answers this - how to extract parameter values from data, and how to quantify uncertainty in those estimates.
Question 2: Is this result real or noise? Hypothesis Testing, Confidence Intervals, Bootstrap, and Power Analysis answer this - the formal machinery for distinguishing signal from sampling variation.
Question 3: Did my intervention cause the outcome? Causal Inference answers this - the hardest question in ML, and the one most engineers get wrong.
Work through each lesson in order. The lessons build on each other: you need hypothesis testing to understand ANOVA, you need ANOVA to understand A/B testing design, and you need all of it to understand why sample size calculation matters.
Let's begin.
