Skip to main content

Module 14 - Bayesian ML

Why Probability Is Not Optional

Every ML model makes predictions. But not every model tells you how confident it is - or how to interpret that confidence. A gradient boosted tree says "fraud probability: 0.87." A neural network says "cat: 0.99." A regression model says "house price: $412,000." All point estimates. No uncertainty.

In production, this is a problem. A self-driving car that is 99% confident is dangerous if that confidence is miscalibrated. A drug discovery model that ignores uncertainty misallocates $10M in experiments. A sensor that gives a point estimate without error bounds is useless to engineers who need to know if a reading is trustworthy.

Bayesian ML is the principled answer. It replaces point estimates with probability distributions - not just "what is the most likely answer" but "what is the full range of plausible answers, and how confident am I?" This unlocks active learning (query the most uncertain points), out-of-distribution detection (high uncertainty signals novel inputs), and calibrated decision-making (act differently when uncertain).

This module covers the full Bayesian toolkit from first principles to production deployment. You will understand when Bayesian methods are worth their computational cost, and when simpler uncertainty heuristics suffice.


Module Map


Lessons in This Module

#LessonKey Concepts
01The Probabilistic Perspective on MLBayes' theorem, MLE vs MAP vs full Bayesian, conjugate priors, epistemic vs aleatoric uncertainty
02Gaussian ProcessesGP prior, kernel functions, posterior predictive, sparse GPs, kernel engineering
03Bayesian Linear RegressionConjugate Gaussian prior, posterior closed-form, ridge connection, predictive uncertainty, evidence maximisation
04Bayesian Neural NetworksVariational inference, ELBO, mean-field approximation, MC Dropout, Laplace approximation
05Variational AutoencodersVAE as latent variable model, ELBO derivation, reparameterisation trick, posterior collapse
06Uncertainty QuantificationEpistemic vs aleatoric decomposition, calibration, ECE, reliability diagrams, temperature scaling
07Conformal PredictionDistribution-free coverage, split conformal, mondrian conformal, prediction sets
08Bayesian OptimisationSurrogate model, acquisition functions (EI, UCB, PI), hyperparameter tuning, BoTorch

Core Bayesian Concepts

Prior P(θ)P(\theta): What you believe about parameters before seeing data. Can encode domain knowledge or be uninformative.

Likelihood P(Dθ)P(\mathcal{D}|\theta): How probable is the observed data given a particular parameter setting? The standard ML loss function is the negative log-likelihood.

Posterior P(θD)P(\theta|\mathcal{D}): Updated beliefs after seeing data. Combines prior and likelihood via Bayes' theorem:

P(θD)=P(Dθ)P(θ)P(D)P(\theta|\mathcal{D}) = \frac{P(\mathcal{D}|\theta)\,P(\theta)}{P(\mathcal{D})}

Epistemic uncertainty: Uncertainty about model parameters - reducible with more data. High in low-data regions. The "I don't know" kind.

Aleatoric uncertainty: Irreducible noise inherent in the data - sensor noise, measurement error, fundamental stochasticity. Cannot be reduced by gathering more data.


When Is Bayesian ML Worth the Cost?

Bayesian methods are computationally expensive. Full posterior inference is usually intractable. Here is a practical decision guide:

ScenarioUse Bayesian?Why
Safety-critical decisions (medical, autonomous)YesCalibrated uncertainty prevents overconfident errors
Small datasets (less than 1,000 samples)YesPrior regularises; avoids overfitting
Active learning / sequential experimentsYesUncertainty drives exploration
Hyperparameter optimisationYesBayesian optimisation beats grid search
Large-scale classification (ImageNet)NoPoint estimates work; MC Dropout as cheap approximation
Online production inference (low latency)Usually noPosterior sampling too slow; use calibration post-hoc
Anomaly / OOD detectionPartialUncertainty score as OOD signal; conformal for coverage guarantees
© 2026 EngineersOfAI. All rights reserved.