Module 14 - Bayesian ML

Why Probability Is Not Optional

Every ML model makes predictions. But not every model tells you how confident it is - or how to interpret that confidence. A gradient boosted tree says "fraud probability: 0.87." A neural network says "cat: 0.99." A regression model says "house price: $412,000." All point estimates. No uncertainty.

In production, this is a problem. A self-driving car that is 99% confident is dangerous if that confidence is miscalibrated. A drug discovery model that ignores uncertainty misallocates $10M in experiments. A sensor that gives a point estimate without error bounds is useless to engineers who need to know if a reading is trustworthy.

Bayesian ML is the principled answer. It replaces point estimates with probability distributions - not just "what is the most likely answer" but "what is the full range of plausible answers, and how confident am I?" This unlocks active learning (query the most uncertain points), out-of-distribution detection (high uncertainty signals novel inputs), and calibrated decision-making (act differently when uncertain).

This module covers the full Bayesian toolkit from first principles to production deployment. You will understand when Bayesian methods are worth their computational cost, and when simpler uncertainty heuristics suffice.

Module Map

Lessons in This Module

#	Lesson	Key Concepts
01	The Probabilistic Perspective on ML	Bayes' theorem, MLE vs MAP vs full Bayesian, conjugate priors, epistemic vs aleatoric uncertainty
02	Gaussian Processes	GP prior, kernel functions, posterior predictive, sparse GPs, kernel engineering
03	Bayesian Linear Regression	Conjugate Gaussian prior, posterior closed-form, ridge connection, predictive uncertainty, evidence maximisation
04	Bayesian Neural Networks	Variational inference, ELBO, mean-field approximation, MC Dropout, Laplace approximation
05	Variational Autoencoders	VAE as latent variable model, ELBO derivation, reparameterisation trick, posterior collapse
06	Uncertainty Quantification	Epistemic vs aleatoric decomposition, calibration, ECE, reliability diagrams, temperature scaling
07	Conformal Prediction	Distribution-free coverage, split conformal, mondrian conformal, prediction sets
08	Bayesian Optimisation	Surrogate model, acquisition functions (EI, UCB, PI), hyperparameter tuning, BoTorch

Core Bayesian Concepts

Prior $P(\theta)$ : What you believe about parameters before seeing data. Can encode domain knowledge or be uninformative.

Likelihood $P(\mathcal{D}|\theta)$ : How probable is the observed data given a particular parameter setting? The standard ML loss function is the negative log-likelihood.

Posterior $P(\theta|\mathcal{D})$ : Updated beliefs after seeing data. Combines prior and likelihood via Bayes' theorem:

$P(\theta|\mathcal{D}) = \frac{P(\mathcal{D}|\theta)\,P(\theta)}{P(\mathcal{D})}$

Epistemic uncertainty: Uncertainty about model parameters - reducible with more data. High in low-data regions. The "I don't know" kind.

Aleatoric uncertainty: Irreducible noise inherent in the data - sensor noise, measurement error, fundamental stochasticity. Cannot be reduced by gathering more data.

When Is Bayesian ML Worth the Cost?

Bayesian methods are computationally expensive. Full posterior inference is usually intractable. Here is a practical decision guide:

Scenario	Use Bayesian?	Why
Safety-critical decisions (medical, autonomous)	Yes	Calibrated uncertainty prevents overconfident errors
Small datasets (less than 1,000 samples)	Yes	Prior regularises; avoids overfitting
Active learning / sequential experiments	Yes	Uncertainty drives exploration
Hyperparameter optimisation	Yes	Bayesian optimisation beats grid search
Large-scale classification (ImageNet)	No	Point estimates work; MC Dropout as cheap approximation
Online production inference (low latency)	Usually no	Posterior sampling too slow; use calibration post-hoc
Anomaly / OOD detection	Partial	Uncertainty score as OOD signal; conformal for coverage guarantees

Why Probability Is Not Optional​

Module Map​

Lessons in This Module​

Core Bayesian Concepts​

When Is Bayesian ML Worth the Cost?​

Why Probability Is Not Optional

Module Map

Lessons in This Module

Core Bayesian Concepts

When Is Bayesian ML Worth the Cost?