Cross-Entropy and Loss Functions
Cross-entropy loss derived from KL divergence and maximum likelihood estimation - binary cross-entropy, categorical cross-entropy, focal loss, and label smoothing.
Cross-entropy loss derived from KL divergence and maximum likelihood estimation - binary cross-entropy, categorical cross-entropy, focal loss, and label smoothing.
Shannon's source coding theorem, Huffman coding, arithmetic coding, lossless vs lossy compression, and why language model perplexity is a compression measure.
Shannon entropy, self-information, binary entropy, differential entropy, and why uncertainty quantification drives decision trees, perplexity, and Bayesian ML.
Statistical manifolds, Fisher information matrix, natural gradient descent, and why second-order optimization methods like K-FAC and Shampoo are geometrically principled.
Kullback-Leibler divergence - asymmetry, forward vs reverse KL, Jensen-Shannon divergence, and applications in VAEs and PPO reinforcement learning.
MDL principle, Kolmogorov complexity, regularization as compression, and information-theoretic model selection - Occam's razor formalized.
How Shannon's information theory underpins every loss function, compression algorithm, and generative model in modern ML engineering.
Mutual information, feature selection, pointwise mutual information in word2vec, and the information bottleneck principle in deep learning.