8 docs tagged with "information-theory"

Cross-Entropy and Loss Functions

Cross-entropy loss derived from KL divergence and maximum likelihood estimation - binary cross-entropy, categorical cross-entropy, focal loss, and label smoothing.

Data Compression Fundamentals

Shannon's source coding theorem, Huffman coding, arithmetic coding, lossless vs lossy compression, and why language model perplexity is a compression measure.

Entropy and Information

Shannon entropy, self-information, binary entropy, differential entropy, and why uncertainty quantification drives decision trees, perplexity, and Bayesian ML.

Information Geometry

Statistical manifolds, Fisher information matrix, natural gradient descent, and why second-order optimization methods like K-FAC and Shampoo are geometrically principled.

KL Divergence

Kullback-Leibler divergence - asymmetry, forward vs reverse KL, Jensen-Shannon divergence, and applications in VAEs and PPO reinforcement learning.

Minimum Description Length

MDL principle, Kolmogorov complexity, regularization as compression, and information-theoretic model selection - Occam's razor formalized.

Module 05 - Information Theory

How Shannon's information theory underpins every loss function, compression algorithm, and generative model in modern ML engineering.

Mutual Information

Mutual information, feature selection, pointwise mutual information in word2vec, and the information bottleneck principle in deep learning.