Skip to main content

144 docs tagged with "ml"

View all tags

Activation Functions

Complete guide to activation functions - sigmoid saturation proofs, dying ReLU mechanics, GELU/Swish/SiLU for modern transformers, PReLU, ELU, SELU, Mish, and a full selection guide with NumPy and PyTorch implementations.

Anomaly Detection in Sequences

Master anomaly detection for sequential data - from statistical baselines to LSTM autoencoders. Learn why standard methods fail on time series, how to pick thresholds, and how to build production-grade systems that catch real anomalies without drowning your team in false alarms.

Autoencoders

Neural network autoencoders for unsupervised representation learning - undercomplete, denoising, sparse, contractive variants with PyTorch on MNIST, anomaly detection, and sparse autoencoders for LLM interpretability.

Backpropagation From Scratch

Full chain rule derivation on computational graphs, Jacobian matrices and vector-Jacobian products, reverse-mode vs forward-mode autodiff, numpy 3-layer MLP implementation, PyTorch custom autograd Functions, and numerical gradient checking - every concept a senior engineer needs to debug, extend, and explain backprop under pressure.

Batch Normalization

Batch normalization mechanics, train vs eval mode pitfalls, loss landscape smoothing theory, Layer Norm, Group Norm, Instance Norm, RMS Norm, pre-norm vs post-norm in transformers, and production freeze patterns - with full PyTorch implementations.

Bias-Variance Tradeoff

The formal decomposition of prediction error into bias, variance, and noise - with production diagnostics, learning curves, double descent, and ensemble strategies.

Classifier-Free Guidance - Steering Diffusion with Text

Complete derivation of CFG from classifier guidance through the Ho-Salimans implicit classifier insight - the guidance scale trade-off, negative prompting mechanics, dynamic thresholding, CFG++ variants, and production sampling implementations.

Convolutional Neural Networks

From first principles - why CNNs exist, how the convolution operation works, weight sharing, hierarchical feature learning, receptive fields, 1x1 convolutions, and depthwise separable convolutions with PyTorch.

Cross-Validation

A comprehensive guide to cross-validation - k-Fold, stratified, repeated, LOOCV, group CV, time-series CV, nested CV, and common pitfalls including data leakage.

Data Augmentation

Theoretically-grounded data augmentation for computer vision - geometric and photometric transforms, CutMix, MixUp, AugMix, RandAugment, Albumentations, and Test-Time Augmentation in production.

Data Representation and Feature Spaces

How raw data is encoded as vectors in feature spaces - tabular, text, image, time-series, and graph data - including the curse of dimensionality and practical feature engineering with sklearn.

DBSCAN and Density-Based Clustering

Master DBSCAN, OPTICS, HDBSCAN, and Mean Shift - density-based clustering algorithms that discover arbitrarily shaped clusters, handle varying densities, and identify anomalies without specifying the number of clusters.

DDIM and Accelerated Diffusion Sampling

How DDIM reduces 1000-step DDPM sampling to 10-50 steps via a non-Markovian process, the eta parameter, DDIM inversion for image editing, and DPM-Solver as the current production standard.

DDPMs - The Mathematical Foundation of Diffusion Models

The complete mathematical derivation of Denoising Diffusion Probabilistic Models - forward process, reverse process, ELBO objective, noise schedule comparison, U-Net architecture, and why predicting noise works better than predicting clean images.

Decision Trees Internals

Deep dive into decision tree internals - recursive binary splitting, CART, Gini and entropy impurity, pruning, and a full from-scratch NumPy implementation for classification and regression.

Deep Q-Networks (DQN)

Scale Q-learning to high-dimensional inputs with neural networks. Learn the DQN architecture, experience replay, target networks, Double DQN, Dueling DQN, Prioritized Experience Replay, and Rainbow. Full PyTorch implementation included.

Dropout and Regularization

Complete guide to dropout mechanics and inverted scaling, L1 vs L2 regularization and weight decay math, Monte Carlo Dropout for uncertainty, Batch Normalization as implicit regularizer, label smoothing cross-entropy derivation, DropConnect and DropPath variants, and a production-quality regularized training loop in PyTorch.

Dynamic Programming for RL

Policy evaluation, policy iteration, and value iteration - solving MDPs exactly when you know the environment model. Master the theoretical foundation that all model-free RL approximates.

Evaluation Metrics for Classification

Precision, recall, F1, AUC-ROC, AUC-PR, log loss, MCC - the complete guide to classification evaluation with business context, code, and when each metric matters.

Evaluation Metrics for Regression

A comprehensive guide to regression evaluation - MAE, MSE, RMSE, R², MAPE, Huber loss, residual diagnostics, business-aligned metrics, and production monitoring patterns.

Feature Importance and SHAP

Master all three feature importance types, TreeSHAP for exact Shapley values, SHAP interaction values, feature selection with SHAP, data leakage detection, fairness analysis, and production importance drift monitoring.

Feature Importance Methods - Beyond SHAP

Permutation importance, impurity-based importance, partial dependence plots, ALE, H-statistics, Sobol indices, and production monitoring - the complete toolkit for understanding which features drive your model's decisions, and when each method lies to you.

Generalization, Overfitting, and Underfitting

Why models fail to generalize - the formal definition of generalization gap, diagnosing overfitting and underfitting, regularization strategies, and distribution shift in production.

Generalized Linear Models

Understand the GLM framework - link functions, exponential family distributions, Poisson regression for count data, Gamma regression for positive continuous targets, IRLS algorithm, overdispersion, and deviance-based model comparison.

GNNs for Recommender Systems

How LightGCN, PinSage, and NGCF use graph neural networks on user-item interaction graphs to capture multi-hop collaborative filtering signals at billion-scale.

Gradient Boosting From Scratch

Understand gradient boosting from first principles - additive models, functional gradient descent, pseudo-residuals for any loss function, shrinkage, stochastic boosting, and bias-variance tradeoffs versus Random Forest.

Gradient Descent From Scratch

Implement gradient descent for linear regression from first principles - derive the gradient, analyze the loss landscape, understand learning rate via Lipschitz constants, implement momentum, gradient clipping, and convergence analysis via condition number.

Graph Attention Networks

GAT - learning which neighbors matter via attention over graph edges. Multi-head attention, GATv2's dynamic attention, heterophilic graphs, and training on Cora with PyTorch Geometric.

Graph Convolutional Networks

GCN derivation from spectral graph theory to efficient spatial message passing. Symmetric normalization, renormalization trick, over-smoothing, and training on Cora with PyG.

Graph Representation for ML

Node embeddings from shallow methods to GNNs - DeepWalk, Node2Vec, LINE, spectral embeddings, manual features, and their fundamental limitations. How to featurize nodes, edges, and graphs.

GraphSAGE and Inductive Learning

GraphSAGE - sample and aggregate for inductive GNNs that generalize to unseen nodes. Neighbor sampling, mini-batch training, unsupervised learning, and PinSage for billion-scale recommendations.

Hierarchical Clustering

Agglomerative and divisive hierarchical clustering - linkage criteria, dendrograms, cophenetic correlation, and production-scale strategies for discovering multi-scale data structure.

HuggingFace Ecosystem

Use the HuggingFace ecosystem end-to-end - transformers, datasets, Trainer API, PEFT/LoRA for efficient fine-tuning, the Hub for sharing models, and tokenizer internals.

Information Gain, Gini Impurity, and Entropy

A deep dive into how decision trees choose splits - Shannon entropy, information gain, Gini impurity, gain ratio, regression variance reduction, and the multi-valued feature bias every practitioner must understand.

K-Means Clustering

Master K-means clustering - Lloyd's algorithm convergence proof, K-means++ initialization with D² weighting, silhouette analysis, elbow method, Mini-batch K-means for large datasets, and customer segmentation pipelines.

Knowledge Graph Embeddings

TransE, RotatE, CompGCN - embedding entities and relations in vector spaces to predict missing facts in knowledge graphs, enabling AI systems to reason about structured world knowledge.

Learning Rate Scheduling

Every major learning rate schedule - step decay, cosine annealing, SGDR warm restarts, linear warmup, 1cycle policy, LR finder - with full PyTorch implementations, the warmup mechanics for Adam, polynomial decay, and a complete selection guide.

LightGBM and CatBoost

Master LightGBM's GOSS and EFB algorithms, CatBoost's ordered target statistics, and learn when to choose each framework for large-scale tabular machine learning.

Linear Regression Internals

Deep dive into linear regression - OLS derivation, normal equations, geometric interpretation as projection, Gauss-Markov theorem, residual diagnostics, Cook's distance, VIF, multicollinearity, and full NumPy implementation.

Logistic Regression Deep Dive

Master logistic regression from first principles - sigmoid derivation, log-likelihood to cross-entropy, decision boundary geometry, softmax multiclass, probability calibration with ECE, class imbalance handling, and full NumPy implementation.

LSTM and GRU Deep Dive

Master Long Short-Term Memory and Gated Recurrent Units - the architectures that solved vanishing gradients and powered a decade of sequence modeling breakthroughs.

Machine Learning - Engineering Track

A structured, production-grade Machine Learning curriculum - from the math that matters to models that deploy. Built for engineers who want to understand how ML works, not just how to call an API.

Maximum Likelihood Estimation

Understand MLE from first principles - derive OLS from Gaussian noise, cross-entropy from Bernoulli, Fisher information, Cramér-Rao bound, and the deep connection between MLE and empirical risk minimization.

MDP and the RL Framework

Master Markov Decision Processes - the mathematical foundation of all reinforcement learning. Understand states, actions, rewards, value functions, the Bellman equations, and how real-world systems are modeled as MDPs.

Message Passing Neural Networks

MPNN - the unified framework showing GCN, GraphSAGE, and GAT are special cases of a single message-passing paradigm with a fundamental 1-WL expressivity ceiling.

Module 01: ML Foundations - Overview

Complete overview of the ML Foundations module - 12 lessons covering the core concepts every ML engineer must know before building production systems.

Module 02 - Linear Models

Master linear models from first principles - the mathematical foundation underlying deep learning, neural networks, and modern ML systems.

Module 03 - Tree Models and Ensembles

Master decision trees and ensemble methods from first principles - the model family that dominates tabular ML competitions and powers production fraud, pricing, and ranking systems worldwide.

Module 04: Neural Networks

A comprehensive engineering-focused guide to neural networks - from the perceptron to training dynamics, optimization, and production debugging.

Module 05 - Computer Vision

A comprehensive module on computer vision covering CNNs, modern architectures, object detection, segmentation, data augmentation, and Vision Transformers using PyTorch.

Module 07: Unsupervised Learning

Learn unsupervised learning algorithms - clustering, dimensionality reduction, and generative models - as applied in production ML systems.

Module 09: ML with Python - Overview

Master the complete ML Python stack - NumPy, Pandas, scikit-learn, PyTorch, HuggingFace, and Weights & Biases - the tools every ML engineer uses every day.

Module 10 - ML System Design

End-to-end ML system design - from problem framing through deployment, feedback loops, and responsible AI. Master the skills that separate ML engineers who ship from those who only experiment.

Module 11 - Reinforcement Learning

A comprehensive module covering RL fundamentals through modern alignment techniques including RLHF and DPO, connecting classical theory to LLM training.

Module 13 - Graph Neural Networks

Master graph neural networks for drug discovery, fraud detection, and recommendations. GCN, GAT, GraphSAGE, MPNN, and knowledge graph embeddings with PyTorch Geometric.

Module 14 - Bayesian ML

Master Bayesian machine learning - from prior/posterior reasoning through Gaussian processes, Bayesian neural networks, and uncertainty quantification to conformal prediction and Bayesian optimisation.

Module 15 - Diffusion Models

Master diffusion models from first principles - DDPM, score matching, DDIM acceleration, latent diffusion, classifier-free guidance, fine-tuning, and evaluation across image, audio, and molecular domains.

Module 8 - Recommender Systems

Learn how modern recommendation engines work - from collaborative filtering and matrix factorization to neural two-tower models and learning to rank - as applied in production systems at Netflix, Amazon, and Spotify.

NumPy for ML

Master NumPy for machine learning - broadcasting, vectorization, linear algebra, memory layout, einsum, and the performance patterns every ML engineer needs.

Object Detection: YOLO and R-CNN

Two-stage and one-stage object detection architectures - from sliding windows and R-CNN to Faster R-CNN, YOLO v8, FPN, anchor boxes, NMS, IoU, and mAP - with full PyTorch implementations.

Optimizers: Adam, SGD, RMSProp

Complete optimizer guide - SGD momentum, Nesterov, AdaGrad, RMSProp, Adam bias correction derivation, AdamW decoupled weight decay, LAMB, Lion, AMSGrad - with NumPy Adam from scratch, PyTorch implementations, and the SGD vs Adam generalization debate.

Pandas for ML

Pandas for machine learning engineers - DataFrame operations, missing data, groupby feature aggregation, time series, memory optimization, and building leakage-free feature matrices.

PCA Dimensionality Reduction

Principal Component Analysis via eigendecomposition and SVD - covariance geometry, reconstruction error, Kernel PCA, Incremental PCA, whitening, and production use for preprocessing and anomaly detection.

Perceptron and MLP

From the McCulloch-Pitts neuron to multi-layer perceptrons - the mathematical foundations of deep learning, XOR proof, universal approximation, forward pass mechanics, depth vs width theory, and full NumPy and PyTorch implementations.

Policy Gradient Methods

Directly optimize policies with gradient ascent - REINFORCE derivation, the log-derivative trick, variance reduction with baselines, actor-critic, A2C/A3C, and entropy regularization. The foundation for PPO and RLHF.

Polynomial Features and Kernel Methods

Extend linear models to nonlinear patterns - polynomial basis expansion, curse of dimensionality, Mercer's theorem for valid kernels, RBF kernel via infinite-dimensional feature space, kernel ridge regression dual form, Nyström and random Fourier features for scalability.

Pooling, Strides, and Padding

Why spatial downsampling exists, how max pooling and strided convolutions compare, how padding controls output dimensions, receptive field growth, dilated convolutions, transposed convolutions, and when to use each - with PyTorch examples.

Probabilistic View of Machine Learning

Framing machine learning through probability - MLE, MAP estimation, prior-posterior reasoning, cross-entropy as negative log-likelihood, calibration, Bayesian deep learning, and uncertainty quantification.

Pruning and Depth Control

How to prevent decision tree overfitting through pre-pruning parameters, cost-complexity post-pruning, weakest-link pruning, MDL principle, and production-grade tuning strategies.

PyTorch DataLoaders and Datasets

Build custom PyTorch Datasets and high-performance DataLoaders - batching, num_workers, pin_memory, samplers, WebDataset for streaming, custom collate_fn, and profiling.

PyTorch Foundations

PyTorch fundamentals for ML engineers - tensors, autograd, nn.Module, device management, reproducibility, mixed precision training, and the computation graph that makes debugging natural.

PyTorch Training Loop

Write production-grade PyTorch training loops - learning rate scheduling, gradient accumulation, mixed precision, checkpointing, early stopping, and debugging.

Q-Learning and SARSA

Model-free temporal difference learning - Q-learning for off-policy control and SARSA for on-policy control. Understand TD vs MC vs DP, convergence conditions, eligibility traces, Double Q-learning, and implement Q-tables in NumPy.

Random Forests

Master Random Forests from first principles - bagging variance reduction math, feature randomization, OOB error estimation, Extra-Trees, bias-variance decomposition, MDI vs permutation importance, and production deployment patterns.

Regularization - L1, L2, and ElasticNet

Master regularization from first principles - bias-variance decomposition, L2 Bayesian interpretation as Gaussian prior, L1 sparsity via subdifferential geometry, elastic net path algorithms, coordinate descent for LASSO, and cross-validation for lambda selection.

RNNs and the Vanishing Gradient Problem

How recurrent neural networks process sequential data through shared hidden states, and why vanishing gradients cripple their ability to learn long-range dependencies.

Scikit-Learn Pipelines

Build production-grade scikit-learn Pipelines - ColumnTransformer, custom transformers, caching, cross-validation without leakage, hyperparameter search, and model serialization.

Semantic Segmentation

Pixel-wise classification with FCN, U-Net, DeepLab atrous convolutions, encoder-decoder architectures, instance segmentation with Mask R-CNN, and full PyTorch U-Net implementation.

Stacking and Blending

Master stacking and blending ensemble techniques - out-of-fold meta-learning, data leakage prevention, model diversity, snapshot ensembling, temporal ensembling, Kaggle competition patterns, and production deployment tradeoffs.

Statistical Learning Theory

The mathematical foundations of machine learning - PAC learning, VC dimension, Rademacher complexity, sample complexity, generalisation bounds, and the theory behind why regularisation works.

Stochastic and Mini-Batch Gradient Descent

Master SGD and mini-batch gradient descent - gradient noise as implicit regularization, convergence proof sketch with decreasing lr, batch size vs generalization, linear scaling rule, cyclic LR, full PyTorch DataLoader training, and distributed SGD.

t-SNE and UMAP

Non-linear dimensionality reduction with t-SNE and UMAP - crowding problem, KL divergence optimization, perplexity, Barnes-Hut approximation, UMAP topological foundations, and production-safe usage.

Temporal Convolutional Networks (TCNs)

Master Temporal Convolutional Networks - causal and dilated convolutions, receptive field math, residual blocks, and when TCNs outperform LSTMs and Transformers in production sequence modeling.

The ML Workflow - End to End

The complete ML engineering workflow from problem framing through data, features, model training, evaluation, deployment, and monitoring - and where projects actually fail.

Time Series Forecasting Patterns

Master the core patterns, classical methods, and deep learning approaches for time series forecasting - including the most critical mistake practitioners make with train/test splits.

Train / Validation / Test Split Strategy

A deep dive into data splitting - why the split matters, how to partition data correctly, data leakage patterns, temporal splits, group splits, and production-grade evaluation design.

Training Dynamics and Debugging

Systematic debugging toolkit for neural network training - loss landscape geometry and flat minima, gradient flow analysis with per-layer norm plots, learning rate finder algorithm, cyclical LR and warmup schedules, gradient clipping strategies, NaN detection hooks, TensorBoard and W&B integration patterns, and a complete pre-training checklist with runnable code.

Transfer Learning and Fine-Tuning

How pretrained ImageNet features transfer across domains, why it works, and the complete engineering playbook for fine-tuning in PyTorch - from feature extraction to progressive unfreezing with discriminative learning rates.

Universal Approximation Theorem

The Universal Approximation Theorem rigorously explained - Cybenko 1989, Hornik 1991, Leshno 1993, depth separation (Telgarsky 2015/2016), Barron's theorem, NTK, Lottery Ticket Hypothesis, double descent, and NumPy demonstrations of approximation quality vs width.

Variational Autoencoders

Master Variational Autoencoders - ELBO derivation, reparameterization trick, β-VAE disentanglement, VQ-VAE discrete latent spaces, conditional VAE, and PyTorch implementation for MNIST generation and anomaly detection.

Vision Transformers (ViT)

How Vision Transformers apply self-attention to image patches - architecture, patch embeddings, positional encoding, DeiT, Swin Transformer, fine-tuning strategies, and production trade-offs against CNNs.

Weight Initialization

Why weight initialization determines whether deep networks train or collapse - symmetry breaking failure, Xavier/Glorot derivation, He/Kaiming for ReLU, LSUV, orthogonal init, bias strategies, and full NumPy experiments measuring gradient flow across 10 layers.

What is Machine Learning?

Three precise ways to think about ML - optimization, compression, and function approximation - with production context, taxonomy, and when ML is the wrong tool.

Why Graphs for ML

When tabular data fails - graph formalism, adjacency matrix, Laplacian, graph types, real-world datasets, the Weisfeiler-Lehman test, and why CNNs cannot handle graph-structured data.

XGBoost Deep Dive

Master XGBoost internals - the 7 innovations over vanilla gradient boosting, optimal leaf weights, gain calculation, hyperparameter tuning, and production deployment with ONNX and GPU training.