Generalized Linear Models
Understand the GLM framework - link functions, exponential family distributions, Poisson regression for count data, Gamma regression for positive continuous targets, IRLS algorithm, overdispersion, and deviance-based model comparison.
Gradient Descent From Scratch
Implement gradient descent for linear regression from first principles - derive the gradient, analyze the loss landscape, understand learning rate via Lipschitz constants, implement momentum, gradient clipping, and convergence analysis via condition number.
Linear Regression Internals
Deep dive into linear regression - OLS derivation, normal equations, geometric interpretation as projection, Gauss-Markov theorem, residual diagnostics, Cook's distance, VIF, multicollinearity, and full NumPy implementation.
Logistic Regression Deep Dive
Master logistic regression from first principles - sigmoid derivation, log-likelihood to cross-entropy, decision boundary geometry, softmax multiclass, probability calibration with ECE, class imbalance handling, and full NumPy implementation.
Maximum Likelihood Estimation
Understand MLE from first principles - derive OLS from Gaussian noise, cross-entropy from Bernoulli, Fisher information, Cramér-Rao bound, and the deep connection between MLE and empirical risk minimization.
Module 02 - Linear Models
Master linear models from first principles - the mathematical foundation underlying deep learning, neural networks, and modern ML systems.
Polynomial Features and Kernel Methods
Extend linear models to nonlinear patterns - polynomial basis expansion, curse of dimensionality, Mercer's theorem for valid kernels, RBF kernel via infinite-dimensional feature space, kernel ridge regression dual form, Nyström and random Fourier features for scalability.
Regularization - L1, L2, and ElasticNet
Master regularization from first principles - bias-variance decomposition, L2 Bayesian interpretation as Gaussian prior, L1 sparsity via subdifferential geometry, elastic net path algorithms, coordinate descent for LASSO, and cross-validation for lambda selection.
Stochastic and Mini-Batch Gradient Descent
Master SGD and mini-batch gradient descent - gradient noise as implicit regularization, convergence proof sketch with decreasing lr, batch size vs generalization, linear scaling rule, cyclic LR, full PyTorch DataLoader training, and distributed SGD.