Automatic Differentiation - How PyTorch Really Computes Gradients
A deep engineering dive into forward mode and reverse mode automatic differentiation, computational graphs, PyTorch autograd internals, custom gradient functions, and when to use torch.no_grad().
Calculus and Optimization for Machine Learning - Module Overview
A complete module map showing how derivatives, gradients, backpropagation, gradient descent, and optimization algorithms connect to training every major ML model.
Chain Rule and Backpropagation - How Neural Networks Learn
A deep engineering dive into the chain rule, computational graphs, forward and backward passes, and how PyTorch autograd implements backpropagation to train networks of any depth.
Convex Functions and Optimization - Why Some Problems Are Easy and Others Are Not
A deep engineering dive into convex functions, convex sets, loss landscape geometry, saddle points, local vs global minima, and why deep learning works despite non-convexity.
Derivatives and Gradients - The Compass of Training
A deep engineering dive into single-variable derivatives, partial derivatives, gradient vectors, and Jacobians - the mathematical foundation behind every gradient-based ML training algorithm.
Gradient Descent Mechanics - The Engine of Every Training Loop
A deep engineering dive into gradient descent derivation, learning rate theory, convergence conditions, batch vs mini-batch vs SGD, momentum, and learning rate schedules with complete Python implementations.
Lagrange Multipliers - Constrained Optimization and the Math Behind SVMs
A deep engineering dive into constrained optimization, the Lagrangian function, KKT conditions, and their ML applications in SVMs, L1/L2 regularization, and trust region methods.
Optimization Algorithms Deep Dive - SGD, Adam, AdamW, and Beyond
A deep engineering dive into the math behind SGD with momentum, AdaGrad, RMSProp, Adam, AdamW, learning rate schedules, gradient clipping, and when to use each optimizer for ML training.
Taylor Series and Approximations - The Mathematics Behind Gradient Descent and Newton's Method
A deep engineering dive into Taylor expansions, why gradient descent uses first-order approximations, how Newton's method uses curvature, quasi-Newton methods, and their practical implications for ML optimization.