ANOVA and Experimental Design - Comparing Multiple Models and A/B Tests
Master Analysis of Variance, F-statistics, one-way and two-way ANOVA, and rigorous A/B test design for ML model comparison and hyperparameter ablations.
Master Analysis of Variance, F-statistics, one-way and two-way ANOVA, and rigorous A/B test design for ML model comparison and hyperparameter ablations.
AR, MA, ARMA, ARIMA, and SARIMA models - derivation, parameter estimation, Box-Jenkins methodology, diagnostic checking, and Python implementation with statsmodels. The classical forecasting baseline every ML engineer must know.
ACF and PACF functions, lag plots, correlograms, Ljung-Box test, and identifying ARIMA orders from autocorrelation structure. Essential for time series model selection in ML and forecasting.
A deep engineering dive into forward mode and reverse mode automatic differentiation, computational graphs, PyTorch autograd internals, custom gradient functions, and when to use torch.no_grad().
Bayes factors, marginal likelihood, BIC and AIC from a Bayesian perspective, Occam's razor via model evidence, and practical model selection in ML.
How Bayesian thinking transforms ML - uncertainty quantification, priors as regularization, probabilistic programming, and principled model comparison. Module map and learning objectives.
Sequential Bayesian updating, online learning, the Beta-Bernoulli stream, and the Kalman filter as Bayesian updating - how beliefs evolve as data arrives.
The philosophical divide between Bayesian and frequentist probability - concrete examples, when each approach is better, and how the choice shapes ML system design.
Mathematical decomposition of generalization error into bias, variance, and noise - with formal derivations, practical examples, and the modern double-descent perspective in deep learning.
Master bootstrap resampling, permutation tests, jackknife, and cross-validation as statistical tools for ML model evaluation. Build everything from scratch in NumPy.
A complete module map showing how derivatives, gradients, backpropagation, gradient descent, and optimization algorithms connect to training every major ML model.
Understand the potential outcomes framework, confounders, average treatment effect, difference-in-differences, and why offline evaluation of recommendation systems fails.
A deep engineering dive into the chain rule, computational graphs, forward and backward passes, and how PyTorch autograd implements backpropagation to train networks of any depth.
Cointegration, Johansen test, error correction models, Granger causality, and their applications in pairs trading, causal feature selection, and financial ML. Essential for multi-series time series analysis.
Bernoulli, Binomial, Multinomial, Gaussian, Exponential, Beta, Dirichlet - the probability distributions that appear throughout machine learning and which model outputs them.
Markov, Chebyshev, and Hoeffding inequalities, the Central Limit Theorem, and the Law of Large Numbers - bounding probabilities and understanding generalization in machine learning.
Conditional probability, Bayes' theorem, prior and posterior, total probability - the engine behind Naive Bayes, Bayesian inference, and generative vs discriminative model design.
Master confidence intervals for ML engineering - correct interpretation of CIs, construction for means and proportions, bootstrap CIs, and uncertainty quantification for model evaluation metrics.
A deep engineering dive into convex functions, convex sets, loss landscape geometry, saddle points, local vs global minima, and why deep learning works despite non-convexity.
Cross-entropy loss derived from KL divergence and maximum likelihood estimation - binary cross-entropy, categorical cross-entropy, focal loss, and label smoothing.
Shannon's source coding theorem, Huffman coding, arithmetic coding, lossless vs lossy compression, and why language model perplexity is a compression measure.
A deep engineering dive into single-variable derivatives, partial derivatives, gradient vectors, and Jacobians - the mathematical foundation behind every gradient-based ML training algorithm.
A deep engineering dive into dot products, orthogonality, vector projection, Gram-Schmidt orthogonalization, and least squares - the mathematical heart of the transformer attention mechanism.
A deep engineering dive into eigenvalues, eigenvectors, and eigendecomposition - the mathematics behind PCA, PageRank, spectral clustering, and graph neural networks.
Shannon entropy, self-information, binary entropy, differential entropy, and why uncertainty quantification drives decision trees, perplexity, and Bayesian ML.
Master Maximum Likelihood Estimation and Maximum A Posteriori estimation. Understand why cross-entropy loss IS negative log-likelihood, and how bias-variance tradeoff applies to estimators.
Expected value, linearity of expectation, variance, covariance, and higher moments - the summary statistics that define how ML models behave over data distributions.
Deep engineering guide to IEEE 754 floating-point, machine epsilon, catastrophic cancellation, float16/bfloat16/float32 in deep learning, and numerical stability techniques for production ML systems.
Discrete Fourier Transform, Fast Fourier Transform, power spectrum, frequency-domain features, and Fourier-based positional encodings in transformers. Essential for audio ML, IoT, and sequence model design.
GP priors over functions, kernel functions (RBF, Matérn, periodic), GP regression with posterior mean and variance, hyperparameter optimization, and Bayesian optimization for ML hyperparameter tuning.
Why classical theory fails for deep learning - double descent, benign overfitting, implicit regularisation of SGD, neural tangent kernel, and modern PAC-Bayes bounds.
A deep engineering dive into gradient descent derivation, learning rate theory, convergence conditions, batch vs mini-batch vs SGD, momentum, and learning rate schedules with complete Python implementations.
Engineering guide to core graph algorithms - BFS, DFS, Dijkstra's shortest path, topological sort, minimum spanning tree, PageRank, and how they enable graph-based feature engineering for ML.
Deep engineering guide to graph theory fundamentals - vertices, edges, directed vs undirected, weighted graphs, paths, cycles, connectivity, and their roles in knowledge graphs, citation networks, and molecular ML.
Engineering guide to graph representation formats - adjacency matrix, adjacency list, edge list, incidence matrix, and memory/compute trade-offs for GNN workloads in PyTorch Geometric and DGL.
Engineering guide to the graph-theoretic foundations of GNNs - message passing framework, GCN/GraphSAGE/GAT, Weisfeiler-Leman expressive power, over-smoothing, and PyTorch Geometric implementation.
Hierarchical Bayesian models, partial pooling, multilevel regression, and the connection to multi-task learning - sharing information across groups with sparse data.
Master hypothesis testing for ML engineering - correct interpretation of p-values, Type I/II errors, t-tests, chi-squared tests, and multiple testing corrections for model comparison.
Statistical manifolds, Fisher information matrix, natural gradient descent, and why second-order optimization methods like K-FAC and Shampoo are geometrically principled.
Engineering guide to iterative methods for linear systems - conjugate gradient, GMRES, preconditioning, and when iterative solvers beat direct methods in large-scale ML workloads.
Joint distributions, marginalization, conditional distributions from joint, independence, covariance matrices, and their role in graphical models and latent variable models.
Kullback-Leibler divergence - asymmetry, forward vs reverse KL, Jensen-Shannon divergence, and applications in VAEs and PPO reinforcement learning.
A deep engineering dive into constrained optimization, the Lagrangian function, KKT conditions, and their ML applications in SVMs, L1/L2 regularization, and trust region methods.
A complete module map showing how vectors, matrices, eigenvalues, SVD, and tensors connect to every major ML algorithm - from attention to PCA to backpropagation.
A complete engineering reference for NumPy linear algebra - np.linalg module, solving systems, decompositions, performance tips, numerical stability, and PyTorch torch.linalg equivalents.
A deep engineering dive into linear maps, kernel (null space), image (column space), rank-nullity theorem, and change of basis - the geometry behind every neural network layer.
Why MCMC is needed, Metropolis-Hastings algorithm, Gibbs sampling, convergence diagnostics (R-hat, trace plots), and practical Bayesian inference with PyMC.
A deep engineering dive into matrix multiplication, transpose, inverse, rank, and determinant - and how they power attention mechanisms, backpropagation, and neural network layers.
MDL principle, Kolmogorov complexity, regularization as compression, and information-theoretic model selection - Occam's razor formalized.
How Shannon's information theory underpins every loss function, compression algorithm, and generative model in modern ML engineering.
Overview of numerical methods for AI - floating-point precision, linear solvers, automatic differentiation, sparse matrices, and why numerical stability determines whether your model trains or diverges.
Overview of graph theory for ML - graph fundamentals, algorithms, spectral methods, network models, and graph neural networks. Connects to GNNs, knowledge graphs, social networks, and molecular ML.
Overview of time series mathematics - stationarity, autocorrelation, Fourier analysis, ARIMA, state-space models, Kalman filter, cointegration, and wavelets. Critical for financial ML, IoT, and sequential model design.
How probability theory underpins every machine learning algorithm - from loss functions to generative models to uncertainty quantification.
Mutual information, feature selection, pointwise mutual information in word2vec, and the information bottleneck principle in deep learning.
A deep engineering dive into L1, L2, L∞, Frobenius, and nuclear norms - and how the geometry of different norms determines which model weights go to zero in Lasso vs Ridge regularization.
Engineering guide to finite difference methods, central difference formulas, step size selection, truncation vs rounding error, and gradient checking to validate automatic differentiation implementations.
Engineering guide to numerical integration methods - quadrature rules, Monte Carlo integration, importance sampling, and applications in Bayesian inference, variational methods, and normalizing constants.
Engineering guide to condition numbers, ill-conditioned matrices, LU/QR/Cholesky factorizations, why you should never invert a matrix, and the numerical stability of neural network backpropagation.
The online learning model, regret bounds, Perceptron algorithm, Follow-The-Leader and Follow-The-Regularised-Leader, Hedge algorithm, and connections to streaming ML and online ad auctions.
A deep engineering dive into the math behind SGD with momentum, AdaGrad, RMSProp, Adam, AdamW, learning rate schedules, gradient clipping, and when to use each optimizer for ML training.
Probably Approximately Correct framework - sample complexity, consistent learners, finite hypothesis classes, and the formal foundation of why data size matters in ML.
A complete derivation of Principal Component Analysis from first principles - covariance matrices, eigendecomposition, explained variance, and PCA via SVD - the way sklearn actually does it.
Choosing priors, conjugate distributions, posterior derivation via Bayes theorem, MAP estimation, and sensitivity analysis - the foundations of practical Bayesian ML.
Kolmogorov axioms, sample spaces, events, conditional probability, and independence - the formal foundations of all probabilistic reasoning in machine learning.
Rademacher complexity as a data-dependent measure of hypothesis class richness - definition, connection to VC dimension, generalization bounds, and why it gives tighter guarantees for ML.
Engineering guide to random graph models - Erdős-Rényi model, Barabási-Albert scale-free networks, small-world networks, degree distributions, and generating synthetic graph data for ML.
Discrete and continuous random variables, PMFs, PDFs, CDFs, and transformations - the formal tools for describing model outputs as probability distributions.
Deep dive into linear regression OLS derivation, multiple regression, R-squared, logistic regression as a GLM, and Ridge/Lasso from a statistical perspective.
Regularisation as Occam's razor - Tikhonov regularisation, structural risk minimisation, the connection between dropout and Bayesian inference, and early stopping as regularisation.
Engineering guide to root-finding algorithms - bisection method, Newton-Raphson, secant method, convergence rates, and ML connections including learning rate scheduling and fixed-point iterations.
Inverse CDF, rejection sampling, importance sampling, MCMC, and Monte Carlo integration - the algorithms that power Bayesian inference, data augmentation, and generative modeling.
Engineering guide to sparse matrix storage formats (CSR, CSC, COO, LIL), sparse operations in SciPy, and why sparsity is fundamental to attention masks, graph adjacency matrices, and embedding tables in production ML.
Engineering guide to spectral graph theory - graph Laplacian, spectral decomposition, graph Fourier transform, spectral clustering, and the connection to Graph Convolutional Networks.
State space representation, Kalman filter derivation, smoothing, sensor fusion, connection to RNNs and LSTMs, and implementation in Python. The mathematical backbone of optimal sequential estimation.
Engineering guide to strict and weak stationarity, ergodicity, unit roots, Augmented Dickey-Fuller test, differencing, and why failing to check stationarity breaks ML time series models.
The mathematical theory of generalization - why ML models work, when they fail, and how to bound their error. Module map and learning objectives for PAC learning, VC dimension, and modern generalization theory.
Master statistical power, effect size, sample size calculation, and power analysis for ML experiments and A/B tests. Know exactly when to stop an experiment and how many examples you need to detect model improvements.
How statistical theory powers ML model evaluation, A/B testing, and production AI systems. Module map, prerequisites, and learning objectives.
A deep engineering dive into Singular Value Decomposition, LU, QR, and Cholesky decompositions - the mathematical tools behind recommender systems, image compression, and numerically stable ML computations.
A deep engineering dive into Taylor expansions, why gradient descent uses first-order approximations, how Newton's method uses curvature, quasi-Newton methods, and their practical implications for ML optimization.
A deep engineering dive into tensors as generalizations of matrices - shapes, axes, contractions, Einstein summation, broadcasting, and vectorization - the computational substrate of every deep learning model.
ELBO derivation, mean-field variational inference, VI vs MCMC tradeoffs, the reparameterization trick, and variational autoencoders - scalable approximate Bayesian inference.
Vapnik-Chervonenkis dimension - shattering, VC dimension of common classifiers, the Fundamental Theorem of Statistical Learning, and why model capacity determines generalization.
A deep engineering dive into vectors, vector spaces, norms, and inner products - the mathematical foundation behind embeddings, cosine similarity, and KNN in ML systems.
Continuous and discrete wavelet transforms, mother wavelets, multiresolution analysis, wavelet denoising, and connections to WaveNet and modern audio neural networks. Simultaneous time-frequency analysis beyond Fourier.