Module 04: Statistics for ML

"Without data, you're just another person with an opinion. But without statistics, data is just noise."

The Production Reality

You've trained a new model. Your offline metrics look great - accuracy is up 1.2%, NDCG improved. But your engineering manager asks: "Is that improvement real, or just lucky random variation?" You push to production. Two weeks later: "Did this model actually lift conversion, or did we just happen to run the experiment during the holiday season?"

Every ML engineer eventually faces these questions. The answers live in statistics.

Statistics is not a collection of formulas to memorize. It is the formal language for reasoning under uncertainty - and ML systems are uncertainty machines. You train on noisy data, you evaluate on finite test sets, you deploy into a world that shifts. Statistical thinking lets you separate signal from noise at every stage of the ML lifecycle.

This module bridges probability theory (Module 03) and the full machine learning curriculum. By the end, you will have the statistical toolkit to:

Design rigorous experiments that prove your model works
Calculate how many samples you need to detect a real improvement
Understand why your offline A/B test results often disagree with online results
Communicate uncertainty in model performance to non-technical stakeholders
Avoid the statistical traps that lead to shipping models that don't actually help users

Module Map

How Statistics Powers ML Engineering

1. Model Training

Concept	Where it appears in ML
Maximum Likelihood Estimation (MLE)	Cross-entropy loss IS negative log-likelihood
Regularisation as MAP estimation	L2 penalty IS a Gaussian prior on weights
Bias-Variance tradeoff	Underfitting vs overfitting
Consistency of estimators	"More data always helps" - but how much?

2. Model Evaluation

Concept	Where it appears in ML
Confidence intervals	"Our model achieves 87.3% accuracy ± 0.4%"
Hypothesis testing	"Is model A significantly better than model B?"
Bootstrap resampling	Robust metric estimation on small test sets
Multiple testing correction	Comparing dozens of hyperparameter configurations

3. Experimentation & Deployment

Concept	Where it appears in ML
A/B testing (ANOVA)	Controlled online experiments
Statistical power	"How long should we run the experiment?"
Causal inference	"Did the model cause the improvement or did confounders?"
Effect size (Cohen's d)	Minimum detectable effect for business KPIs

Lesson-by-Lesson Real-World Use Cases

Lesson	Real-World ML Use Case
01 Estimation Theory	Training neural networks (cross-entropy = MLE); Bayesian regularisation
02 Hypothesis Testing	Model comparison tests; feature importance validation; detecting data drift
03 Confidence Intervals	Reporting model performance with uncertainty bounds
04 Bootstrap & Resampling	Evaluating variance in F1-score; k-fold cross-validation as resampling
05 Regression Analysis	Linear models as ML foundation; understanding logistic regression deeply
06 ANOVA & Experimental Design	A/B testing new model variants; hyperparameter ablation studies
07 Causal Inference	Why offline recommendation metrics lie; online A/B as ground truth
08 Statistical Power	Deciding sample size before launching an experiment

Prerequisites

Before starting this module, you should be comfortable with:

Module 01 - Linear Algebra: Matrix operations, eigendecomposition (for regression)
Module 02 - Calculus: Derivatives, optimization (for MLE derivations)
Module 03 - Probability Theory: Random variables, distributions, expectation, Bayes theorem

You do not need to have taken a formal statistics course. This module is self-contained from a statistics perspective, building everything from probability foundations.

:::note Required Python Libraries

# All code in this module uses:
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import t, norm, chi2, f

Install with: pip install numpy scipy statsmodels matplotlib :::

Learning Objectives

By the end of this module, you will be able to:

Conceptual Understanding

Explain why cross-entropy loss is equivalent to Maximum Likelihood Estimation
Correctly interpret a p-value (and identify common misconceptions)
Explain what a 95% confidence interval means - and what it does NOT mean
Distinguish correlation from causation using the potential outcomes framework

Mathematical Skills

Derive the MLE estimator for Gaussian and Bernoulli distributions
Compute t-tests, chi-squared tests, and F-statistics by hand
Construct bootstrap confidence intervals from scratch
Calculate required sample size given power, effect size, and significance level

Engineering Skills

Write production-quality A/B test analysis code in Python
Choose the right statistical test for model comparison
Apply multiple testing corrections when comparing many model variants
Detect confounders in offline evaluation scenarios

Interview Readiness

Answer "What is a p-value?" without triggering the incorrect "probability the null is true" trap
Explain the bias-variance tradeoff in terms of estimation theory
Design a sample size calculation for a new ML experiment

How Statistics Connects to the Rest of the Curriculum

Probability Theory (Module 03)
         │
         ▼
Statistics for ML (Module 04)  ◄──── This module
         │
         ├──► Bayesian Statistics (Module 06)
         │         └─ Priors, posteriors, MCMC
         │
         ├──► Statistical Learning Theory (Module 07)
         │         └─ PAC learning, VC dimension, generalisation bounds
         │
         └──► Information Theory (Module 05)
                   └─ Entropy, KL divergence, cross-entropy

Statistics is the connective tissue of the math curriculum. MLE from this module explains why cross-entropy loss works. Confidence intervals connect to PAC learning bounds. Hypothesis testing IS what you're doing every time you compare models.

The Three Core Questions of ML Statistics

Every statistical concept in this module answers one of three fundamental ML questions:

Question 1: What should I estimate? Estimation Theory answers this - how to extract parameter values from data, and how to quantify uncertainty in those estimates.

Question 2: Is this result real or noise? Hypothesis Testing, Confidence Intervals, Bootstrap, and Power Analysis answer this - the formal machinery for distinguishing signal from sampling variation.

Question 3: Did my intervention cause the outcome? Causal Inference answers this - the hardest question in ML, and the one most engineers get wrong.

Work through each lesson in order. The lessons build on each other: you need hypothesis testing to understand ANOVA, you need ANOVA to understand A/B testing design, and you need all of it to understand why sample size calculation matters.

Let's begin.

The Production Reality​

Module Map​

How Statistics Powers ML Engineering​

1. Model Training​

2. Model Evaluation​

3. Experimentation & Deployment​

Lesson-by-Lesson Real-World Use Cases​

Prerequisites​

Learning Objectives​

How Statistics Connects to the Rest of the Curriculum​

The Three Core Questions of ML Statistics​