Skip to main content

86 docs tagged with "math-for-ai"

View all tags

ARIMA Models

AR, MA, ARMA, ARIMA, and SARIMA models - derivation, parameter estimation, Box-Jenkins methodology, diagnostic checking, and Python implementation with statsmodels. The classical forecasting baseline every ML engineer must know.

Autocorrelation and Partial Autocorrelation

ACF and PACF functions, lag plots, correlograms, Ljung-Box test, and identifying ARIMA orders from autocorrelation structure. Essential for time series model selection in ML and forecasting.

Bayesian Model Comparison

Bayes factors, marginal likelihood, BIC and AIC from a Bayesian perspective, Occam's razor via model evidence, and practical model selection in ML.

Bayesian Statistics - Module Overview

How Bayesian thinking transforms ML - uncertainty quantification, priors as regularization, probabilistic programming, and principled model comparison. Module map and learning objectives.

Bayesian Updating

Sequential Bayesian updating, online learning, the Beta-Bernoulli stream, and the Kalman filter as Bayesian updating - how beliefs evolve as data arrives.

Bayesian vs Frequentist Statistics

The philosophical divide between Bayesian and frequentist probability - concrete examples, when each approach is better, and how the choice shapes ML system design.

Bias-Variance Tradeoff

Mathematical decomposition of generalization error into bias, variance, and noise - with formal derivations, practical examples, and the modern double-descent perspective in deep learning.

Cointegration and Granger Causality

Cointegration, Johansen test, error correction models, Granger causality, and their applications in pairs trading, causal feature selection, and financial ML. Essential for multi-series time series analysis.

Common Probability Distributions

Bernoulli, Binomial, Multinomial, Gaussian, Exponential, Beta, Dirichlet - the probability distributions that appear throughout machine learning and which model outputs them.

Concentration Inequalities

Markov, Chebyshev, and Hoeffding inequalities, the Central Limit Theorem, and the Law of Large Numbers - bounding probabilities and understanding generalization in machine learning.

Conditional Probability and Bayes' Theorem

Conditional probability, Bayes' theorem, prior and posterior, total probability - the engine behind Naive Bayes, Bayesian inference, and generative vs discriminative model design.

Cross-Entropy and Loss Functions

Cross-entropy loss derived from KL divergence and maximum likelihood estimation - binary cross-entropy, categorical cross-entropy, focal loss, and label smoothing.

Data Compression Fundamentals

Shannon's source coding theorem, Huffman coding, arithmetic coding, lossless vs lossy compression, and why language model perplexity is a compression measure.

Entropy and Information

Shannon entropy, self-information, binary entropy, differential entropy, and why uncertainty quantification drives decision trees, perplexity, and Bayesian ML.

Expectation, Variance, and Moments

Expected value, linearity of expectation, variance, covariance, and higher moments - the summary statistics that define how ML models behave over data distributions.

Fourier Analysis for ML Engineers

Discrete Fourier Transform, Fast Fourier Transform, power spectrum, frequency-domain features, and Fourier-based positional encodings in transformers. Essential for audio ML, IoT, and sequence model design.

Gaussian Processes

GP priors over functions, kernel functions (RBF, Matérn, periodic), GP regression with posterior mean and variance, hyperparameter optimization, and Bayesian optimization for ML hyperparameter tuning.

Generalisation Bounds in Deep Learning

Why classical theory fails for deep learning - double descent, benign overfitting, implicit regularisation of SGD, neural tangent kernel, and modern PAC-Bayes bounds.

Hierarchical Models

Hierarchical Bayesian models, partial pooling, multilevel regression, and the connection to multi-task learning - sharing information across groups with sparse data.

Information Geometry

Statistical manifolds, Fisher information matrix, natural gradient descent, and why second-order optimization methods like K-FAC and Shampoo are geometrically principled.

Joint and Marginal Distributions

Joint distributions, marginalization, conditional distributions from joint, independence, covariance matrices, and their role in graphical models and latent variable models.

KL Divergence

Kullback-Leibler divergence - asymmetry, forward vs reverse KL, Jensen-Shannon divergence, and applications in VAEs and PPO reinforcement learning.

Markov Chain Monte Carlo

Why MCMC is needed, Metropolis-Hastings algorithm, Gibbs sampling, convergence diagnostics (R-hat, trace plots), and practical Bayesian inference with PyMC.

Minimum Description Length

MDL principle, Kolmogorov complexity, regularization as compression, and information-theoretic model selection - Occam's razor formalized.

Module 05 - Information Theory

How Shannon's information theory underpins every loss function, compression algorithm, and generative model in modern ML engineering.

Module 08 - Numerical Methods for ML Engineering

Overview of numerical methods for AI - floating-point precision, linear solvers, automatic differentiation, sparse matrices, and why numerical stability determines whether your model trains or diverges.

Module 09 - Graph Theory for ML Engineering

Overview of graph theory for ML - graph fundamentals, algorithms, spectral methods, network models, and graph neural networks. Connects to GNNs, knowledge graphs, social networks, and molecular ML.

Module 10 - Time Series Mathematics for ML Engineering

Overview of time series mathematics - stationarity, autocorrelation, Fourier analysis, ARIMA, state-space models, Kalman filter, cointegration, and wavelets. Critical for financial ML, IoT, and sequential model design.

Mutual Information

Mutual information, feature selection, pointwise mutual information in word2vec, and the information bottleneck principle in deep learning.

Online Learning Theory

The online learning model, regret bounds, Perceptron algorithm, Follow-The-Leader and Follow-The-Regularised-Leader, Hedge algorithm, and connections to streaming ML and online ad auctions.

PAC Learning

Probably Approximately Correct framework - sample complexity, consistent learners, finite hypothesis classes, and the formal foundation of why data size matters in ML.

Prior and Posterior Distributions

Choosing priors, conjugate distributions, posterior derivation via Bayes theorem, MAP estimation, and sensitivity analysis - the foundations of practical Bayesian ML.

Probability Axioms and Events

Kolmogorov axioms, sample spaces, events, conditional probability, and independence - the formal foundations of all probabilistic reasoning in machine learning.

Rademacher Complexity

Rademacher complexity as a data-dependent measure of hypothesis class richness - definition, connection to VC dimension, generalization bounds, and why it gives tighter guarantees for ML.

Random Variables and Distributions

Discrete and continuous random variables, PMFs, PDFs, CDFs, and transformations - the formal tools for describing model outputs as probability distributions.

Regularisation Theory

Regularisation as Occam's razor - Tikhonov regularisation, structural risk minimisation, the connection between dropout and Bayesian inference, and early stopping as regularisation.

Sampling Methods

Inverse CDF, rejection sampling, importance sampling, MCMC, and Monte Carlo integration - the algorithms that power Bayesian inference, data augmentation, and generative modeling.

State Space Models and the Kalman Filter

State space representation, Kalman filter derivation, smoothing, sensor fusion, connection to RNNs and LSTMs, and implementation in Python. The mathematical backbone of optimal sequential estimation.

Statistical Learning Theory - Module Overview

The mathematical theory of generalization - why ML models work, when they fail, and how to bound their error. Module map and learning objectives for PAC learning, VC dimension, and modern generalization theory.

Statistics for ML - Module Overview

How statistical theory powers ML model evaluation, A/B testing, and production AI systems. Module map, prerequisites, and learning objectives.

Variational Inference

ELBO derivation, mean-field variational inference, VI vs MCMC tradeoffs, the reparameterization trick, and variational autoencoders - scalable approximate Bayesian inference.

VC Dimension

Vapnik-Chervonenkis dimension - shattering, VC dimension of common classifiers, the Fundamental Theorem of Statistical Learning, and why model capacity determines generalization.

Wavelets and Multiscale Analysis

Continuous and discrete wavelet transforms, mother wavelets, multiresolution analysis, wavelet denoising, and connections to WaveNet and modern audio neural networks. Simultaneous time-frequency analysis beyond Fourier.