Module 02: Linear Models

Why Linear Models Still Matter in 2026

Every ML engineer eventually realizes the same thing: linear models are not a stepping stone - they are the foundation. When a feature team at Google wants to understand why their ranking model changed behavior, they reach for linear probes. When a production system at Stripe needs to explain a fraud decision, they fall back to logistic regression. When PyTorch implements a neural network layer, it is doing y = Wx + b - a linear model at its core.

Linear models teach you three things that no other model class can:

Optimization intuition - gradient descent, convergence, learning rates. You cannot understand training deep networks without understanding these on linear models first.
Statistical grounding - what assumptions are you making? What breaks when they're violated? Linear models make the implicit explicit.
Interpretability baseline - in production, you often need to justify predictions. Linear models give you exact feature contributions.

This module builds linear models from first principles, derives every formula, and connects theory to production engineering decisions.

Module Map

Lesson Table

#	Lesson	Core Concept	You Will Build
01	Linear Regression Internals	OLS, Normal Equation, Geometric Interpretation	Regression from scratch + residual diagnostics
02	Gradient Descent From Scratch	∇L = (2/n)Xᵀ(Xw-y), convergence	Full GD with loss curves
03	Stochastic and Mini-Batch GD	SGD noise, epoch dynamics	Three-variant comparison
04	Logistic Regression Deep Dive	Sigmoid, cross-entropy, softmax	Binary + multi-class classifier
05	Regularization: L1, L2, ElasticNet	Sparsity, shrinkage, paths	Regularization path plots
06	Polynomial Features and Kernels	Feature expansion, kernel trick	Kernel SVM classifier
07	Maximum Likelihood Estimation	MLE → OLS → Cross-entropy	MLE from scratch for Gaussian + Bernoulli
08	Generalized Linear Models	Link functions, exponential family	Poisson + Gamma regression

How Linear Models Connect to the Rest of ML

Linear Models → Neural Networks

A single-layer neural network with no activation function is linear regression. Add sigmoid - it becomes logistic regression. Add softmax - multi-class classification. The entire deep learning stack is linear models composed with nonlinearities.

$\text{Neural net layer: } \mathbf{h} = \sigma(\mathbf{W}\mathbf{x} + \mathbf{b})$

Understanding gradient descent on linear models (Lesson 02) directly transfers to understanding backpropagation.

Regularization → Weight Decay

L2 regularization in linear models is identical to weight decay in neural network training. L1 regularization becomes the intuition behind sparse attention and pruning. These are not separate ideas.

Logistic Regression → Softmax Classifier → LLM Output Head

The output layer of every language model is a linear layer followed by softmax - exactly logistic regression for multi-class. The math in Lesson 04 is the math inside GPT-4.

GLMs → Probabilistic ML

Understanding that linear regression assumes Gaussian noise (Lesson 07) and that you can swap that noise distribution (Lesson 08) opens the door to Bayesian ML, variational inference, and probabilistic forecasting.

Prerequisites

Calculus: partial derivatives, chain rule
Linear algebra: matrix multiply, transpose, inverse
Python: NumPy comfortable, sklearn basics

Production Context

Linear models appear in production in ways that might surprise you:

Ranking systems: LinkedIn and Twitter use linear models as the first-stage scorer in multi-stage ranking pipelines. They run at billions of QPS where even a 10ms latency budget is tight.

Calibration: After training a complex tree or neural model, ML teams fit a logistic regression on top to calibrate probabilities. The entire calibration layer is Lesson 04.

Interpretability under regulation: In finance and healthcare, regulators require model explanations. Logistic regression gives exact feature coefficients. Even when the underlying model is complex, teams often maintain a "shadow" linear model for compliance.

Feature selection: Lasso (L1 regularization, Lesson 05) is used in genomics, drug discovery, and high-dimensional settings to identify the ~50 predictive features out of 50,000 candidates.

This module will make you fluent in all of it.

Why Linear Models Still Matter in 2026​

Module Map​

Lesson Table​

How Linear Models Connect to the Rest of ML​

Linear Models → Neural Networks​

Regularization → Weight Decay​

Logistic Regression → Softmax Classifier → LLM Output Head​

GLMs → Probabilistic ML​

Prerequisites​

Production Context​