Skip to main content

12 docs tagged with "neural-networks"

View all tags

Activation Functions

Complete guide to activation functions - sigmoid saturation proofs, dying ReLU mechanics, GELU/Swish/SiLU for modern transformers, PReLU, ELU, SELU, Mish, and a full selection guide with NumPy and PyTorch implementations.

Backpropagation From Scratch

Full chain rule derivation on computational graphs, Jacobian matrices and vector-Jacobian products, reverse-mode vs forward-mode autodiff, numpy 3-layer MLP implementation, PyTorch custom autograd Functions, and numerical gradient checking - every concept a senior engineer needs to debug, extend, and explain backprop under pressure.

Batch Normalization

Batch normalization mechanics, train vs eval mode pitfalls, loss landscape smoothing theory, Layer Norm, Group Norm, Instance Norm, RMS Norm, pre-norm vs post-norm in transformers, and production freeze patterns - with full PyTorch implementations.

Dropout and Regularization

Complete guide to dropout mechanics and inverted scaling, L1 vs L2 regularization and weight decay math, Monte Carlo Dropout for uncertainty, Batch Normalization as implicit regularizer, label smoothing cross-entropy derivation, DropConnect and DropPath variants, and a production-quality regularized training loop in PyTorch.

Learning Rate Scheduling

Every major learning rate schedule - step decay, cosine annealing, SGDR warm restarts, linear warmup, 1cycle policy, LR finder - with full PyTorch implementations, the warmup mechanics for Adam, polynomial decay, and a complete selection guide.

Module 04: Neural Networks

A comprehensive engineering-focused guide to neural networks - from the perceptron to training dynamics, optimization, and production debugging.

Optimizers: Adam, SGD, RMSProp

Complete optimizer guide - SGD momentum, Nesterov, AdaGrad, RMSProp, Adam bias correction derivation, AdamW decoupled weight decay, LAMB, Lion, AMSGrad - with NumPy Adam from scratch, PyTorch implementations, and the SGD vs Adam generalization debate.

Perceptron and MLP

From the McCulloch-Pitts neuron to multi-layer perceptrons - the mathematical foundations of deep learning, XOR proof, universal approximation, forward pass mechanics, depth vs width theory, and full NumPy and PyTorch implementations.

PyTorch Fundamentals

PyTorch tensors, autograd, neural network modules, training loops, GPU acceleration, and production patterns for deep learning.

Training Dynamics and Debugging

Systematic debugging toolkit for neural network training - loss landscape geometry and flat minima, gradient flow analysis with per-layer norm plots, learning rate finder algorithm, cyclical LR and warmup schedules, gradient clipping strategies, NaN detection hooks, TensorBoard and W&B integration patterns, and a complete pre-training checklist with runnable code.

Universal Approximation Theorem

The Universal Approximation Theorem rigorously explained - Cybenko 1989, Hornik 1991, Leshno 1993, depth separation (Telgarsky 2015/2016), Barron's theorem, NTK, Lottery Ticket Hypothesis, double descent, and NumPy demonstrations of approximation quality vs width.

Weight Initialization

Why weight initialization determines whether deep networks train or collapse - symmetry breaking failure, Xavier/Glorot derivation, He/Kaiming for ReLU, LSUV, orthogonal init, bias strategies, and full NumPy experiments measuring gradient flow across 10 layers.