Bias-Variance Tradeoff
The formal decomposition of prediction error into bias, variance, and noise - with production diagnostics, learning curves, double descent, and ensemble strategies.
The formal decomposition of prediction error into bias, variance, and noise - with production diagnostics, learning curves, double descent, and ensemble strategies.
A comprehensive guide to cross-validation - k-Fold, stratified, repeated, LOOCV, group CV, time-series CV, nested CV, and common pitfalls including data leakage.
How raw data is encoded as vectors in feature spaces - tabular, text, image, time-series, and graph data - including the curse of dimensionality and practical feature engineering with sklearn.
Precision, recall, F1, AUC-ROC, AUC-PR, log loss, MCC - the complete guide to classification evaluation with business context, code, and when each metric matters.
A comprehensive guide to regression evaluation - MAE, MSE, RMSE, R², MAPE, Huber loss, residual diagnostics, business-aligned metrics, and production monitoring patterns.
Why models fail to generalize - the formal definition of generalization gap, diagnosing overfitting and underfitting, regularization strategies, and distribution shift in production.
Complete overview of the ML Foundations module - 12 lessons covering the core concepts every ML engineer must know before building production systems.
Framing machine learning through probability - MLE, MAP estimation, prior-posterior reasoning, cross-entropy as negative log-likelihood, calibration, Bayesian deep learning, and uncertainty quantification.
The mathematical foundations of machine learning - PAC learning, VC dimension, Rademacher complexity, sample complexity, generalisation bounds, and the theory behind why regularisation works.
A deep engineering guide to the three core ML paradigms - supervised, unsupervised, semi-supervised, self-supervised, and RL - with data requirements, use cases, and when to choose each.
The complete ML engineering workflow from problem framing through data, features, model training, evaluation, deployment, and monitoring - and where projects actually fail.
A deep dive into data splitting - why the split matters, how to partition data correctly, data leakage patterns, temporal splits, group splits, and production-grade evaluation design.
Three precise ways to think about ML - optimization, compression, and function approximation - with production context, taxonomy, and when ML is the wrong tool.