86 docs tagged with "math-for-ai"

ANOVA and Experimental Design - Comparing Multiple Models and A/B Tests

Master Analysis of Variance, F-statistics, one-way and two-way ANOVA, and rigorous A/B test design for ML model comparison and hyperparameter ablations.

ARIMA Models

AR, MA, ARMA, ARIMA, and SARIMA models - derivation, parameter estimation, Box-Jenkins methodology, diagnostic checking, and Python implementation with statsmodels. The classical forecasting baseline every ML engineer must know.

Autocorrelation and Partial Autocorrelation

ACF and PACF functions, lag plots, correlograms, Ljung-Box test, and identifying ARIMA orders from autocorrelation structure. Essential for time series model selection in ML and forecasting.

Automatic Differentiation - How PyTorch Really Computes Gradients

A deep engineering dive into forward mode and reverse mode automatic differentiation, computational graphs, PyTorch autograd internals, custom gradient functions, and when to use torch.no_grad().

Bayesian Model Comparison

Bayes factors, marginal likelihood, BIC and AIC from a Bayesian perspective, Occam's razor via model evidence, and practical model selection in ML.

Bayesian Statistics - Module Overview

How Bayesian thinking transforms ML - uncertainty quantification, priors as regularization, probabilistic programming, and principled model comparison. Module map and learning objectives.

Bayesian Updating

Sequential Bayesian updating, online learning, the Beta-Bernoulli stream, and the Kalman filter as Bayesian updating - how beliefs evolve as data arrives.

Bayesian vs Frequentist Statistics

The philosophical divide between Bayesian and frequentist probability - concrete examples, when each approach is better, and how the choice shapes ML system design.

Bias-Variance Tradeoff

Mathematical decomposition of generalization error into bias, variance, and noise - with formal derivations, practical examples, and the modern double-descent perspective in deep learning.

Bootstrap and Resampling - Robust Uncertainty Estimation for ML

Master bootstrap resampling, permutation tests, jackknife, and cross-validation as statistical tools for ML model evaluation. Build everything from scratch in NumPy.

Calculus and Optimization for Machine Learning - Module Overview

A complete module map showing how derivatives, gradients, backpropagation, gradient descent, and optimization algorithms connect to training every major ML model.

Causal Inference Basics - Why Correlation Misleads and Online A/B Tests Win

Understand the potential outcomes framework, confounders, average treatment effect, difference-in-differences, and why offline evaluation of recommendation systems fails.

Chain Rule and Backpropagation - How Neural Networks Learn

A deep engineering dive into the chain rule, computational graphs, forward and backward passes, and how PyTorch autograd implements backpropagation to train networks of any depth.

Cointegration and Granger Causality

Cointegration, Johansen test, error correction models, Granger causality, and their applications in pairs trading, causal feature selection, and financial ML. Essential for multi-series time series analysis.

Common Probability Distributions

Bernoulli, Binomial, Multinomial, Gaussian, Exponential, Beta, Dirichlet - the probability distributions that appear throughout machine learning and which model outputs them.

Concentration Inequalities

Markov, Chebyshev, and Hoeffding inequalities, the Central Limit Theorem, and the Law of Large Numbers - bounding probabilities and understanding generalization in machine learning.

Conditional Probability and Bayes' Theorem

Conditional probability, Bayes' theorem, prior and posterior, total probability - the engine behind Naive Bayes, Bayesian inference, and generative vs discriminative model design.

Confidence Intervals - Quantifying Uncertainty in Model Metrics

Master confidence intervals for ML engineering - correct interpretation of CIs, construction for means and proportions, bootstrap CIs, and uncertainty quantification for model evaluation metrics.

Convex Functions and Optimization - Why Some Problems Are Easy and Others Are Not

A deep engineering dive into convex functions, convex sets, loss landscape geometry, saddle points, local vs global minima, and why deep learning works despite non-convexity.

Cross-Entropy and Loss Functions

Cross-entropy loss derived from KL divergence and maximum likelihood estimation - binary cross-entropy, categorical cross-entropy, focal loss, and label smoothing.

Data Compression Fundamentals

Shannon's source coding theorem, Huffman coding, arithmetic coding, lossless vs lossy compression, and why language model perplexity is a compression measure.

Derivatives and Gradients - The Compass of Training

A deep engineering dive into single-variable derivatives, partial derivatives, gradient vectors, and Jacobians - the mathematical foundation behind every gradient-based ML training algorithm.

Dot Products and Projections - The Math Behind Attention

A deep engineering dive into dot products, orthogonality, vector projection, Gram-Schmidt orthogonalization, and least squares - the mathematical heart of the transformer attention mechanism.

Eigenvalues and Eigenvectors - How PageRank Ranks the Internet

A deep engineering dive into eigenvalues, eigenvectors, and eigendecomposition - the mathematics behind PCA, PageRank, spectral clustering, and graph neural networks.

Entropy and Information

Shannon entropy, self-information, binary entropy, differential entropy, and why uncertainty quantification drives decision trees, perplexity, and Bayesian ML.

Estimation Theory - MLE, MAP, and the Foundations of ML Training

Master Maximum Likelihood Estimation and Maximum A Posteriori estimation. Understand why cross-entropy loss IS negative log-likelihood, and how bias-variance tradeoff applies to estimators.

Expectation, Variance, and Moments

Expected value, linearity of expectation, variance, covariance, and higher moments - the summary statistics that define how ML models behave over data distributions.

Floating-Point Arithmetic - Precision, Overflow, and Mixed Precision Training

Deep engineering guide to IEEE 754 floating-point, machine epsilon, catastrophic cancellation, float16/bfloat16/float32 in deep learning, and numerical stability techniques for production ML systems.

Fourier Analysis for ML Engineers

Discrete Fourier Transform, Fast Fourier Transform, power spectrum, frequency-domain features, and Fourier-based positional encodings in transformers. Essential for audio ML, IoT, and sequence model design.

Gaussian Processes

GP priors over functions, kernel functions (RBF, Matérn, periodic), GP regression with posterior mean and variance, hyperparameter optimization, and Bayesian optimization for ML hyperparameter tuning.

Generalisation Bounds in Deep Learning

Why classical theory fails for deep learning - double descent, benign overfitting, implicit regularisation of SGD, neural tangent kernel, and modern PAC-Bayes bounds.

Gradient Descent Mechanics - The Engine of Every Training Loop

A deep engineering dive into gradient descent derivation, learning rate theory, convergence conditions, batch vs mini-batch vs SGD, momentum, and learning rate schedules with complete Python implementations.

Graph Algorithms - BFS, DFS, Dijkstra, PageRank, and ML Feature Engineering

Engineering guide to core graph algorithms - BFS, DFS, Dijkstra's shortest path, topological sort, minimum spanning tree, PageRank, and how they enable graph-based feature engineering for ML.

Graph Fundamentals - Vertices, Edges, Paths, and Graph Types in ML

Deep engineering guide to graph theory fundamentals - vertices, edges, directed vs undirected, weighted graphs, paths, cycles, connectivity, and their roles in knowledge graphs, citation networks, and molecular ML.

Graph Representations - Adjacency Matrix, Edge List, and ML Tradeoffs

Engineering guide to graph representation formats - adjacency matrix, adjacency list, edge list, incidence matrix, and memory/compute trade-offs for GNN workloads in PyTorch Geometric and DGL.

Graph Theory for GNNs - Message Passing, Expressiveness, and Over-Smoothing

Engineering guide to the graph-theoretic foundations of GNNs - message passing framework, GCN/GraphSAGE/GAT, Weisfeiler-Leman expressive power, over-smoothing, and PyTorch Geometric implementation.

Hierarchical Models

Hierarchical Bayesian models, partial pooling, multilevel regression, and the connection to multi-task learning - sharing information across groups with sparse data.

Hypothesis Testing - p-values, t-tests, and Model Comparison

Master hypothesis testing for ML engineering - correct interpretation of p-values, Type I/II errors, t-tests, chi-squared tests, and multiple testing corrections for model comparison.

Information Geometry

Statistical manifolds, Fisher information matrix, natural gradient descent, and why second-order optimization methods like K-FAC and Shampoo are geometrically principled.

Iterative Solvers - Conjugate Gradient, Krylov Methods, and Large-Scale ML

Engineering guide to iterative methods for linear systems - conjugate gradient, GMRES, preconditioning, and when iterative solvers beat direct methods in large-scale ML workloads.

Joint and Marginal Distributions

Joint distributions, marginalization, conditional distributions from joint, independence, covariance matrices, and their role in graphical models and latent variable models.

KL Divergence

Kullback-Leibler divergence - asymmetry, forward vs reverse KL, Jensen-Shannon divergence, and applications in VAEs and PPO reinforcement learning.

Lagrange Multipliers - Constrained Optimization and the Math Behind SVMs

A deep engineering dive into constrained optimization, the Lagrangian function, KKT conditions, and their ML applications in SVMs, L1/L2 regularization, and trust region methods.

Linear Algebra for Machine Learning - Module Overview

A complete module map showing how vectors, matrices, eigenvalues, SVD, and tensors connect to every major ML algorithm - from attention to PCA to backpropagation.

Linear Algebra in NumPy - A Complete Engineering Reference

A complete engineering reference for NumPy linear algebra - np.linalg module, solving systems, decompositions, performance tips, numerical stability, and PyTorch torch.linalg equivalents.

Linear Transformations - The Geometry of Neural Network Layers

A deep engineering dive into linear maps, kernel (null space), image (column space), rank-nullity theorem, and change of basis - the geometry behind every neural network layer.

Markov Chain Monte Carlo

Why MCMC is needed, Metropolis-Hastings algorithm, Gibbs sampling, convergence diagnostics (R-hat, trace plots), and practical Bayesian inference with PyMC.

Matrix Operations - The Engine of the Neural Network Forward Pass

A deep engineering dive into matrix multiplication, transpose, inverse, rank, and determinant - and how they power attention mechanisms, backpropagation, and neural network layers.

Minimum Description Length

MDL principle, Kolmogorov complexity, regularization as compression, and information-theoretic model selection - Occam's razor formalized.

Module 05 - Information Theory

How Shannon's information theory underpins every loss function, compression algorithm, and generative model in modern ML engineering.

Module 08 - Numerical Methods for ML Engineering

Overview of numerical methods for AI - floating-point precision, linear solvers, automatic differentiation, sparse matrices, and why numerical stability determines whether your model trains or diverges.

Module 09 - Graph Theory for ML Engineering

Overview of graph theory for ML - graph fundamentals, algorithms, spectral methods, network models, and graph neural networks. Connects to GNNs, knowledge graphs, social networks, and molecular ML.

Module 10 - Time Series Mathematics for ML Engineering

Overview of time series mathematics - stationarity, autocorrelation, Fourier analysis, ARIMA, state-space models, Kalman filter, cointegration, and wavelets. Critical for financial ML, IoT, and sequential model design.

Module 3 - Probability Theory Overview

How probability theory underpins every machine learning algorithm - from loss functions to generative models to uncertainty quantification.

Mutual Information

Mutual information, feature selection, pointwise mutual information in word2vec, and the information bottleneck principle in deep learning.

Norms and Distance Metrics - The Geometry of Regularization

A deep engineering dive into L1, L2, L∞, Frobenius, and nuclear norms - and how the geometry of different norms determines which model weights go to zero in Lasso vs Ridge regularization.

Numerical Differentiation - Finite Differences, Gradient Checking, and Autodiff

Engineering guide to finite difference methods, central difference formulas, step size selection, truncation vs rounding error, and gradient checking to validate automatic differentiation implementations.

Numerical Integration - Quadrature, Monte Carlo, and Bayesian Inference

Engineering guide to numerical integration methods - quadrature rules, Monte Carlo integration, importance sampling, and applications in Bayesian inference, variational methods, and normalizing constants.

Numerical Linear Algebra - Condition Numbers, Solvers, and Backprop Stability

Engineering guide to condition numbers, ill-conditioned matrices, LU/QR/Cholesky factorizations, why you should never invert a matrix, and the numerical stability of neural network backpropagation.

Online Learning Theory

The online learning model, regret bounds, Perceptron algorithm, Follow-The-Leader and Follow-The-Regularised-Leader, Hedge algorithm, and connections to streaming ML and online ad auctions.

Optimization Algorithms Deep Dive - SGD, Adam, AdamW, and Beyond

A deep engineering dive into the math behind SGD with momentum, AdaGrad, RMSProp, Adam, AdamW, learning rate schedules, gradient clipping, and when to use each optimizer for ML training.

PAC Learning

Probably Approximately Correct framework - sample complexity, consistent learners, finite hypothesis classes, and the formal foundation of why data size matters in ML.

PCA from Linear Algebra - Eigendecomposition of Covariance Matrices

A complete derivation of Principal Component Analysis from first principles - covariance matrices, eigendecomposition, explained variance, and PCA via SVD - the way sklearn actually does it.

Prior and Posterior Distributions

Choosing priors, conjugate distributions, posterior derivation via Bayes theorem, MAP estimation, and sensitivity analysis - the foundations of practical Bayesian ML.

Probability Axioms and Events

Kolmogorov axioms, sample spaces, events, conditional probability, and independence - the formal foundations of all probabilistic reasoning in machine learning.

Rademacher Complexity

Rademacher complexity as a data-dependent measure of hypothesis class richness - definition, connection to VC dimension, generalization bounds, and why it gives tighter guarantees for ML.

Random Graphs and Network Models - Erdős-Rényi, Scale-Free, and Synthetic Data

Engineering guide to random graph models - Erdős-Rényi model, Barabási-Albert scale-free networks, small-world networks, degree distributions, and generating synthetic graph data for ML.

Random Variables and Distributions

Discrete and continuous random variables, PMFs, PDFs, CDFs, and transformations - the formal tools for describing model outputs as probability distributions.

Regression Analysis - OLS, Logistic Regression, and Regularised Models

Deep dive into linear regression OLS derivation, multiple regression, R-squared, logistic regression as a GLM, and Ridge/Lasso from a statistical perspective.

Regularisation Theory

Regularisation as Occam's razor - Tikhonov regularisation, structural risk minimisation, the connection between dropout and Bayesian inference, and early stopping as regularisation.

Root-Finding Algorithms - Bisection, Newton-Raphson, and ML Applications

Engineering guide to root-finding algorithms - bisection method, Newton-Raphson, secant method, convergence rates, and ML connections including learning rate scheduling and fixed-point iterations.

Sampling Methods

Inverse CDF, rejection sampling, importance sampling, MCMC, and Monte Carlo integration - the algorithms that power Bayesian inference, data augmentation, and generative modeling.

Sparse Matrix Methods - CSR/CSC Formats, Efficient Operations, and ML Sparsity

Engineering guide to sparse matrix storage formats (CSR, CSC, COO, LIL), sparse operations in SciPy, and why sparsity is fundamental to attention masks, graph adjacency matrices, and embedding tables in production ML.

Spectral Graph Theory - Graph Laplacian, Eigenvalues, and Spectral Clustering

Engineering guide to spectral graph theory - graph Laplacian, spectral decomposition, graph Fourier transform, spectral clustering, and the connection to Graph Convolutional Networks.

State Space Models and the Kalman Filter

State space representation, Kalman filter derivation, smoothing, sensor fusion, connection to RNNs and LSTMs, and implementation in Python. The mathematical backbone of optimal sequential estimation.

Stationarity and Ergodicity - The Prerequisites for Time Series ML

Engineering guide to strict and weak stationarity, ergodicity, unit roots, Augmented Dickey-Fuller test, differencing, and why failing to check stationarity breaks ML time series models.

Statistical Learning Theory - Module Overview

The mathematical theory of generalization - why ML models work, when they fail, and how to bound their error. Module map and learning objectives for PAC learning, VC dimension, and modern generalization theory.

Statistical Power and Sample Size - How Many Samples Do You Actually Need?

Master statistical power, effect size, sample size calculation, and power analysis for ML experiments and A/B tests. Know exactly when to stop an experiment and how many examples you need to detect model improvements.

Statistics for ML - Module Overview

How statistical theory powers ML model evaluation, A/B testing, and production AI systems. Module map, prerequisites, and learning objectives.

SVD and Matrix Decompositions - The Netflix Algorithm and JPEG Compression

A deep engineering dive into Singular Value Decomposition, LU, QR, and Cholesky decompositions - the mathematical tools behind recommender systems, image compression, and numerically stable ML computations.

Taylor Series and Approximations - The Mathematics Behind Gradient Descent and Newton's Method

A deep engineering dive into Taylor expansions, why gradient descent uses first-order approximations, how Newton's method uses curvature, quasi-Newton methods, and their practical implications for ML optimization.

Tensors for Deep Learning - From Matrices to Multi-Dimensional Arrays

A deep engineering dive into tensors as generalizations of matrices - shapes, axes, contractions, Einstein summation, broadcasting, and vectorization - the computational substrate of every deep learning model.

Variational Inference

ELBO derivation, mean-field variational inference, VI vs MCMC tradeoffs, the reparameterization trick, and variational autoencoders - scalable approximate Bayesian inference.

VC Dimension

Vapnik-Chervonenkis dimension - shattering, VC dimension of common classifiers, the Fundamental Theorem of Statistical Learning, and why model capacity determines generalization.

Vectors and Vector Spaces - The Language of Embeddings

A deep engineering dive into vectors, vector spaces, norms, and inner products - the mathematical foundation behind embeddings, cosine similarity, and KNN in ML systems.

Wavelets and Multiscale Analysis

Continuous and discrete wavelet transforms, mother wavelets, multiresolution analysis, wavelet denoising, and connections to WaveNet and modern audio neural networks. Simultaneous time-frequency analysis beyond Fourier.