Optimization Algorithms Deep Dive - SGD, Adam, AdamW, and Beyond
A deep engineering dive into the math behind SGD with momentum, AdaGrad, RMSProp, Adam, AdamW, learning rate schedules, gradient clipping, and when to use each optimizer for ML training.
