__init__ and Object Construction - Two-Phase Creation at Engineering Depth
Understand how Python actually constructs objects - the difference between __new__ and __init__, two-phase creation, mutable default argument traps, super().__init__() in inheritance chains, and factory patterns with classmethods.
__init_subclass__ - The Modern Alternative to Metaclasses
Master __init_subclass__ for subclass registration, definition-time validation, plugin registries, and keyword arguments in class statements - the Pythonic replacement for most metaclass use cases.
__set_name__ - The Descriptor Naming Protocol
Understand __set_name__, Python's descriptor self-naming protocol - how it eliminates name redundancy, how type.__new__ calls it, and how Django, Pydantic, and SQLAlchemy use it to build self-configuring field systems.
01 - Agent Risk Taxonomy
Eight categories of agent risk, the confused deputy problem, severity matrices, and a Python risk assessment module.
01 - Task Decomposition
How agents break complex goals into ordered, dependency-tracked subtasks. Hierarchical decomposition, DAG representation, dynamic replanning, and full Python implementation.
02 - Minimal Footprint Principle
Least privilege, reversibility preference, scope confirmation, and a Python minimal-footprint agent wrapper.
02 - Planning with LLMs
Zero-shot, chain-of-thought, Tree of Thoughts, ReWOO, and MCTS-guided planning. When LLM plans fail and how to recover. Full Python implementation of Tree of Thoughts.
03 - Checkpointing and Recovery
How to save agent state mid-run, resume after failures, design idempotent actions, and build production-grade checkpoint systems with SQLite and S3.
03 - Prompt Injection in Agents
Indirect prompt injection attacks, real-world examples, detection and defense strategies, and a Python injection defense system.
04 - Guardrails and Action Validation
Pre- and post-action guardrails, composable validators, denylist enforcement, rate limiting, and a complete Python guardrail pipeline.
04 - Handling Ambiguity and Clarification
How agents detect ambiguous instructions, decide when to ask vs. proceed, design targeted clarification questions, and avoid the overly-cautious anti-pattern.
05 - Interruption and Human-in-the-Loop
When and how agents pause for human judgment. Action classification, async approval workflows, Slack-based HITL, and resuming after interruption.
06 - Evaluation of Long-Horizon Tasks
How to evaluate multi-step agent trajectories. Task completion, path quality, error recovery, efficiency, and LLM-as-judge. Benchmarks and trajectory scorers.
3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding
Large multimodal models are increasingly used as the reasoning core of embodied agents operating in 3D environments, yet they remain prone to hallucinat...
3DTCR: A Physics-Based Generative Framework for Vortex-Following 3D Reconstruction to Improve Tropical Cyclone Intensity Forecasting
Tropical cyclone (TC) intensity forecasting remains challenging as current numerical and AI-based weather models fail to satisfactorily represent extrem...
3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis
Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this...
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
Dynamic spatial reasoning from monocular video is essential for bridging visual intelligence and the physical world, yet remains challenging for vision-...
A 1/R Law for Kurtosis Contrast in Balanced Mixtures
Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We prove a sharp redundancy law: for a standardized projection w...
A Bayesian Updating Framework for Long-term Multi-Environment Trial Data in Plant Breeding
In variety testing, multi-environment trials (MET) are essential for evaluating the genotypic performance of crop plants. A persistent challenge in the...
A Benchmark for Interactive World Models with a Unified Action Generation Framework
Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable env...
A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications
Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery....
A Dataset is Worth 1 MB
A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate o...
A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions
Dirac-Frenkel instantaneous residual minimization evolves nonlinear parametrizations of PDE solutions in time, but ill-conditioning can render the param...
A distributed semismooth Newton based augmented Lagrangian method for distributed optimization
This paper proposes a novel distributed semismooth Newton based augmented Lagrangian method for solving a class of optimization problems over networks,...
A Federated Many-to-One Hopfield model for associative Neural Networks
Federated learning enables collaborative training without sharing raw data, but struggles under client heterogeneity and streaming distribution shifts,...
A Foundation Model for Zero-Shot Logical Rule Induction
Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to...
A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
Anticipating diverse future states is a central challenge in video world modeling. Discriminative world models produce a deterministic prediction that i...
A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets
Video game engines have been an important source for generating large volumes of visual synthetic datasets for training and evaluating computer vision a...
A Learning-based Multi-Frame Visual Feature Framework for Real-Time Driver Fatigue Detection.
A Learning-based Multi-Frame Visual Feature Framewor... - published at NAACL 2025.
A multimodal slice discovery framework for systematic failure detection and explanation in medical image classification
Despite advances in machine learning-based medical image classifiers, the safety and reliability of these systems remain major concerns in practical set...
A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis
Existing convergence of distributed optimization methods in non-Euclidean geometries typically rely on kernel assumptions: (i) global Lipschitz smoothne...
A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound
In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the $\ln \ln T$ factor in the parameter-free learning with expert bound. I...
A note on the area under the likelihood and the fake evidence for model selection
Improper priors are not allowed for the computation of the Bayesian evidence $Z=p({f y})$ (a.k.a., marginal likelihood), since in this case $Z$ is not...
A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based Matching
Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying...
A novel hybrid approach for positive-valued DAG learning
Causal discovery from observational data remains a fundamental challenge in machine learning and statistics, particularly when variables represent inher...
A Practical Analysis of Human Alignment with *PO.
A Practical Analysis of Human Alignment with *PO. - published at NAACL 2025.
A Predictive View on Streaming Hidden Markov Models
We develop a predictive-first optimisation framework for streaming hidden Markov models. Unlike classical approaches that prioritise full posterior reco...
A Proper Scoring Rule for Virtual Staining
Generative virtual staining (VS) models for high-throughput screening (HTS) can provide an estimated posterior distribution of possible biological featu...
A Quantitative Characterization of Forgetting in Post-Training
Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develo...
A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention
Machine-learning interatomic potentials (MLIPs) have advanced rapidly, with many top models relying on strong physics-based inductive biases. However, a...
A Reference Architecture of Reinforcement Learning Frameworks
The surge in reinforcement learning (RL) applications gave rise to diverse supporting technology, such as RL frameworks. However, the architectural patt...
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment f...
A Semantic-Aware Layer-Freezing Approach to Computation-Efficient Fine-Tuning of Language Models.
A Semantic-Aware Layer-Freezing Approach to Computat... - published at ACL 2025.
A Stein Identity for q-Gaussians with Bounded Support
Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving...
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning
As audio-visual multi-modal large language models (MLLMs) are increasingly deployed in safety-critical applications, understanding their vulnerabilities...
A Temporally Augmented Graph Attention Network for Affordance Classification
Graph attention networks (GATs) provide one of the best frameworks for learning node representations in relational data; but, existing variants such as...
A theory of learning data statistics in diffusion models, from easy to hard
While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue fi...
A Training-free LLM-based Approach to General Chinese Character Error Correction.
A Training-free LLM-based Approach to General Chines... - published at ACL 2025.
A Tsetlin Machine-driven Intrusion Detection System for Next-Generation IoMT Security
The rapid adoption of the Internet of Medical Things (IoMT) is transforming healthcare by enabling seamless connectivity among medical devices, systems,...
A two-step sequential approach for hyperparameter selection in finite context models
Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the cont...
A unified perspective on fine-tuning and sampling with diffusion and flow models
We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base de...
A Variational Estimator for $L_p$ Calibration Errors
Calibration - the problem of ensuring that predicted probabilities align with observed class frequencies - is a basic desideratum for reliable ML prediction.
A^2RD: Agentic Autoregressive Diffusion for Long Video Consistency
Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over...
A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to ev...
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research.
AbGen: Evaluating Large Language Models in Ablation... - published at ACL 2025.
Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL
Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by gues...
Abstract Base Classes - Enforcing Interfaces at Engineering Depth
Master Python's ABC system - abc.ABC, @abstractmethod, ABCMeta, virtual subclasses via register(), collections.abc built-in protocols, using ABCs in type hints, and the ABCs vs typing.Protocol trade-off.
AcademiClaw: When Students Set Challenges for AI Agents
Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw...
Accelerating Speculative Decoding with Block Diffusion Draft Trees
Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model...
Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification
Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial...
Accurate and Reliable Uncertainty Estimates for Deterministic Predictions Extensions to Under and Overpredictions
Computational models support high-stakes decisions across engineering and science, and practitioners increasingly seek probabilistic predictions to quan...
Accurate and scalable exchange-correlation with deep learning
Density Functional Theory (DFT) underpins much of modern computational chemistry and materials science. Yet, the reliability of DFT-derived predictions...
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation
Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either...
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reachi...
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a pro...
Action Images: End-to-End Policy Learning via Multiview Video Generation
World action models (WAMs) have emerged as a promising direction for robot policy learning, as they can leverage powerful video backbones to model the f...
Activation Functions
Complete guide to activation functions - sigmoid saturation proofs, dying ReLU mechanics, GELU/Swish/SiLU for modern transformers, PReLU, ELU, SELU, Mish, and a full selection guide with NumPy and PyTorch implementations.
Active Bipartite Ranking with Smooth Posterior Distributions
In this article, bipartite ranking, a statistical learning problem involved in many applications and widely studied in the passive context, is approache...
Active Few-Shot Learning for Text Classification.
How to intelligently select which examples to annotate when you only have a handful of labeled samples per class. Combines active learning with few-shot text classification to minimize annotation cost - directly applicable to intent detection, content moderation, and domain-specific NLP tasks.
Active Learning
Selecting the most informative samples for labeling - uncertainty sampling, diversity strategies, query-by-committee, and LLM-based active learning for text classification.
Ad Click Prediction at Scale
End-to-end design of a production ad click prediction system - covering Wide and Deep learning, feature engineering at scale, online learning, calibration, and serving under 10ms.
AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning
A novel regularization technique, AdaCubic, is proposed that adapts the weight of the cubic term. The heart of AdaCubic is an auxiliary optimization pro...
Adam's Law: Textual Frequency Law on Large Language Models
While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom st...
Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference
In this paper, we provide the first investigation into adaptive combinatorial experimental design, focusing on the trade-off between regret minimization...
Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty
Minimising a spectral risk objective, defined as a convex combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the unc...
Adaptive Learning Systems
Learn how adaptive learning systems model student knowledge state and sequence educational content using IRT, CAT, spaced repetition, and multi-armed bandits to maximize learning outcomes.
Adaptive multi-fidelity optimization with fast learning rates
In multi-fidelity optimization, biased approximations of varying costs of the target function are available. This paper studies the problem of optimizin...
Adaptive Querying with AI Persona Priors
We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within t...
ADD for Multi-Bit Image Watermarking
As generative models enable rapid creation of high-fidelity images, societal concerns about misinformation and authenticity have intensified. A promisin...
Advanced Event Loop
Master event loop internals including selectors, callbacks, timers, custom policies, uvloop, run_in_executor, and signal handling for production async systems.
Advanced Generic Patterns
Master Self type, TypeVarTuple, recursive types, generic protocols, and generic type aliases for framework-level type-safe design including builder patterns and tensor shape typing.
Advanced PEFT Methods
Beyond LoRA - Prefix Tuning, Prompt Tuning, IA3, AdaLoRA, VeRA, and LoftQ. When to reach for each method, how they compare on parameter count and quality, and practical implementation with the PEFT library.
Advanced Prompting Techniques
Master self-refinement, Tree of Thought, ReAct, meta-prompting, and other advanced techniques for reliable, sophisticated LLM behavior in production.
Advanced RAG Patterns
Go beyond naive RAG - master query transformation, HyDE, multi-query retrieval, Self-RAG, Corrective RAG, and iterative retrieval patterns for complex questions.
Advanced Spark Performance Tuning for ML Workloads
Systematic techniques to diagnose and eliminate Spark bottlenecks - data skew, shuffle overhead, memory pressure, and suboptimal joins - reducing job time and cost by 10x.
Advancing Language Models through Instruction Tuning: Recent Progress and Challenges.
Advancing Language Models through Instruction Tuning... - published at EMNLP 2025.
Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series
The development of the Bielik v3 PL series, encompassing both the 7B and 11B parameter variants, represents a significant milestone in the field of lang...
Adversarial Examples
Crafting inputs that reliably cause model failures - attack techniques, transferability, and robust defense strategies for production AI systems.
AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment.
AERA Chat: An Interactive Platform for Automated Exp... - published at EMNLP 2025.
Agent Communication Protocols
How agents pass information: message formats, schemas, synchronous vs async, routing, error propagation, and tracing through multi-agent systems.
Agent Evaluation
Measuring LLM agent performance through trajectory analysis, benchmark suites, LLM-as-judge, failure taxonomies, and production monitoring strategies.
Agent Safety and Guardrails
Implementing defense-in-depth safety for production LLM agents - prompt injection defense, input/output guardrails, tool sandboxing, HITL confirmation, and audit logging.
Agent vs Chatbot vs Workflow
Precise technical definitions for chatbots, workflows, and AI agents - with decision criteria, cost/reliability tradeoffs, and code examples of all three for the same task.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Co...
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning.
AgentCPM-GUI: Building Mobile-Use Agents with Reinfo... - published at EMNLP 2025.
AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning
Large Language Models (LLMs) increasingly rely on agentic capabilities-iterative retrieval, tool use, and decision-making-to overcome the limits of stat...
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in pa...
Agentic AI Systems Should Be Designed as Marginal Token Allocators
This position paper argues that agentic AI systems should be designed and evaluated as marginal token allocation economies rather than as text generator...
Agentic Code Editing
How coding agents read, navigate, and surgically modify existing codebases: edit strategies, minimal diffs, regression prevention, and multi-file coordination.
Agentic Design Patterns
The 5 core patterns from Anthropic's research - prompt chaining, routing, parallelization, orchestrator-subagents, and evaluator-optimizer - with full Python implementations.
Agentic RAG
Build RAG systems that reason, iterate, and self-correct - covering Self-RAG, FLARE, ReAct tool-augmented RAG, RAPTOR, and Corrective RAG with full production implementations using the Anthropic SDK.
Agentic RAG
Build agents that control their own retrieval - multi-step reasoning, router agents, ReAct loops, LangGraph stateful pipelines, and production patterns for agentic retrieval systems.
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a centra...
Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity
LLM-based agents are assumed to integrate environmental observations into their reasoning: discovering highly relevant but unexpected information should...
AgentSearchBench: A Benchmark for AI Agent Search in the Wild
The rapid growth of AI agent ecosystems is transforming how complex tasks are delegated and executed, creating a new challenge of identifying suitable a...
AgentSPEX: An Agent SPecification and EXecution Language
Language-model agent systems commonly rely on reactive prompting, in which a single instruction guides the model through an open-ended sequence of reaso...
AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents
As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critic...
Agnostic learning in (almost) optimal time via Gaussian surface area
The complexity of learning a concept class under Gaussian marginals in the difficult agnostic model is closely related to its $L_1$-approximability by l...
AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval
This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while...
AI Agents Can Already Autonomously Perform Experimental High Energy Physics
Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with min...
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathem...
AI Error Handling and Fallbacks
Graceful degradation, retry logic, circuit breakers, fallback model chains, and user-facing error messages for production AI systems.
AI Feature Flags and Rollouts
Safely rolling out AI features with canary deployments, quality-gated rollouts, A/B testing, and kill switches.
AI in Litigation Support
Timeline extraction, deposition analysis, exhibit classification, chronology building, and the AI systems that help litigators prepare and try cases.
AI Product Architecture
End-to-end architecture for a production AI product from API to database.
AI Product Design Principles
Principles for designing AI products that build trust, degrade gracefully, and solve the last-mile problem between model capability and user value.
AI Regulation and FDA Compliance
Regulatory landscape for healthcare AI - FDA SaMD classification, 510(k) vs PMA clearance, EU AI Act, HIPAA compliance for AI, bias auditing, and post-market surveillance for deployed medical AI systems.
AI Safety Evaluations
Safety benchmarks, capability evaluations, LLM judges, uplift assessments, and how labs like Anthropic use evaluation-gated deployment through Responsible Scaling Policies.
AI scientists produce results without reasoning scientifically
Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to t...
AI Security Governance
Organizational security policies, risk classification frameworks, compliance programs, lifecycle governance, model cards, incident response, and vendor risk management for responsible AI system deployment.
AI-Powered Assessment
Learn how AI systems automatically score essays, grade short answers, generate feedback, detect plagiarism, and audit for bias in educational assessment pipelines.
AIPOM: Agent-aware Interactive Planning for Multi-Agent Systems.
AIPOM: Agent-aware Interactive Planning for Multi-Ag... - published at EMNLP 2025.
Airflow for ML Pipelines
Orchestrate ML training pipelines with Airflow - data quality gates, KubernetesPodOperator training, champion/challenger evaluation, and conditional deployment.
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation
As reinforcement learning continues to scale the training of large language model-based agents, reliably verifying agent behaviors in complex environmen...
Alerting and Incident Response for ML
ML-specific alerting design, alert taxonomy, routing with PagerDuty and OpsGenie, on-call runbook design for ML models, post-mortem templates, and reducing MTTD and MTTR for ML incidents.
Alerting on LLM Quality Degradation
Build production alerting systems for LLM quality - threshold alerts, statistical process control, anomaly detection, deployment correlation, runbooks, and Prometheus/Grafana integration.
Amortized Optimal Transport from Sliced Potentials
We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich...
Ampere, Hopper, and Ada Architectures
What changed across GPU generations for AI - A100 vs H100 vs H200 vs RTX 4090, NVLink bandwidth, transformer engine, FP8 support, and architecture selection for training and inference.
An adaptive wavelet-based PINN for problems with localized high-magnitude source
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer fro...
An Address Intelligence Framework for E-commerce Deliveries.
An Address Intelligence Framework for E-commerce Del... - published at EMNLP 2025.
An automatic counting algorithm for the quantification and uncertainty analysis of the number of microglial cells trainable in small and heterogeneous datasets
Counting immunopositive cells on biological tissues generally requires either manual annotation or (when available) automatic rough systems, for scannin...
An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks
Federated learning (FL) is an effective paradigm for distributed environments such as the Internet of Things (IoT), where data from diverse devices with...
An Open-Source, Open Data Approach to Activity Classification from Triaxial Accelerometry in an Ambulatory Setting
The accelerometer has become an almost ubiquitous device, providing enormous opportunities in healthcare monitoring beyond step counting or other averag...
An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning
In online incremental learning, data continuously arrives with substantial distributional shifts, creating a significant challenge because previous samp...
Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts.
Analysing LLM Persona Generation and Fairness Interp... - published at EACL 2026.
Anisotropic Modality Align
Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the sha...
Annotation Pipelines
Data labeling workflows, annotation guidelines, inter-annotator agreement, conflict resolution, and quality control for training data that powers AI systems.
Anomaly Detection in Sequences
Master anomaly detection for sequential data - from statistical baselines to LSTM autoencoders. Learn why standard methods fail on time series, how to pick thresholds, and how to build production-grade systems that catch real anomalies without drowning your team in false alarms.
Anomaly Detection on Sensor Data
Learn how to detect anomalies in industrial sensor data using statistical baselines, isolation forests, LSTM autoencoders, multivariate deep learning methods, and real-time streaming architectures.
AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors
Zero-shot anomaly detection aims to detect and localise abnormal regions in the image without access to any in-domain training images. While recent appr...
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model
While the field of vision-language (VL) has achieved remarkable success in integrating visual and textual information across multiple languages and doma...
ANTIC: Adaptive Neural Temporal In-situ Compressor
The persistent storage requirements for high-resolution, spatiotemporally evolving fields governed by large-scale and high-dimensional partial different...
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing...
Apache Airflow Architecture
Deep dive into Apache Airflow - DAGs, Scheduler internals, Executors, Operators, XCom, and production patterns for reliable pipeline orchestration.
Apache Airflow for ML
Learn how to use Apache Airflow to orchestrate production ML pipelines - DAG authoring, executors, XCom patterns, and avoiding the most common Airflow pitfalls.
Apache Flink Fundamentals
Apache Flink for stateful stream processing - DataStream API, windows, watermarks, state backends, checkpointing, and PyFlink for ML feature computation.
Apache Hudi
Hudi's copy-on-write vs merge-on-read and upsert patterns.
Apache Iceberg
Iceberg table format, ACID transactions, schema evolution, and time travel.
Apache Kafka Architecture - The Nervous System of Real-Time ML
A deep dive into Kafka's distributed commit log, partitions, replication, consumer groups, compacted topics, and the architectural decisions that make it the standard event transport for production ML systems.
Apache Spark Architecture
How Spark's distributed execution model works - RDDs, DataFrames, DAG planning, Catalyst optimization, and Tungsten execution - explained for engineers building ML data pipelines.
APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the expl...
Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images
Recent advances in vision-language models (VLMs) have improved image captioning for cultural heritage. However, inferring structured cultural metadata (...
Apple Silicon for AI
Apple M-series unified memory architecture for ML inference - how the ANE, GPU, and CPU share one memory pool, why this matters for local LLMs, and how to run models with MLX and llama.cpp on Apple Silicon.
Approximate Nearest Neighbor Algorithms
Deep dive into HNSW, IVF, Product Quantization, IVFPQ, LSH, and DiskANN - how each algorithm trades recall for speed and how to choose the right one for your dataset.
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performanc...
Argumentation and Judgement Factors: LLM-based Discovery and Application in Insurance Disputes.
Argumentation and Judgement Factors: LLM-based Disco... - published at EACL 2026.
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mech...
ARM vs x86 for AI Workloads
Comprehensive comparison of ARM and x86 architectures for ML workloads - ISA design, power efficiency, Apple Silicon unified memory, AWS Graviton3 inference, and performance-per-watt analysis for production AI systems.
Artifact Management & Experiment Organization
Managing ML artifacts at scale - naming conventions, tagging, parent-child relationships, archival policies, and finding the model that became production from 2000 runs.
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing...
Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education
Orofacial clefts are among the most common congenital craniofacial abnormalities, yet accurate prenatal detection remains challenging due to the scarcit...
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes. As...
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval.
ASRank: Zero-Shot Re-Ranking with Answer Scent for D... - published at NAACL 2025.
Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended...
Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark
Surgical resection remains the only potentially curative treatment for pancreatic ductal adenocarcinoma (PDAC), and eligibility depends on accurate asse...
Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE
The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain po...
Async Context Managers
Master async resource management with __aenter__/__aexit__, asynccontextmanager, AsyncExitStack, and production patterns for connection pools and sessions.
Async Generators and Async Iterators
Build streaming data pipelines with async for, async yield, __aiter__/__anext__, async comprehensions, and finalization protocols for production async iteration.
Async LLM Calls
Asynchronous LLM call patterns for high-throughput applications - concurrency control with semaphores, producer-consumer queues, token bucket rate limiting, circuit breakers, and async orchestration patterns.
Async Synchronization Patterns
Implement bounded concurrency, rate limiting, and circuit breakers with asyncio locks, semaphores, events, conditions, and barriers.
ATANT: An Evaluation Framework for AI Continuity
We present ATANT (Automated Test for Acceptance of Narrative Truth), an open evaluation framework for measuring continuity in AI systems: the ability to...
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
Large language models (LLMs) increasingly rely on chain-of-thought (CoT) reasoning to solve complex tasks. Yet ensuring that the reasoning trace both co...
Attention as Explanation - What Transformers Are (and Aren't) Looking At
When attention weights help explain transformer decisions, when they mislead, and the debate between attention-as-explanation and attention-is-not-explanation.
Attention Is All You Need
The 2017 Vaswani et al. paper that replaced recurrent networks with pure attention - and why it changed everything.
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their trans...
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
We present Audio Flamingo Next (AF-Next), the next-generation and most capable large audio-language model in the Audio Flamingo series, designed to adva...
Audio-Language Models
How modern AI systems process speech and audio - from Whisper's spectrogram-based ASR to end-to-end audio understanding in GPT-4o and Gemini.
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Recent progress in multimodal models has spurred rapid advances in audio understanding, generation, and editing. However, these capabilities are typical...
Audio-Visual Intelligence in Large Foundation Models
Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines...
Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning
Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominen...
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and...
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
We study auto research as a closed empirical loop driven by external measurement. Each submitted trial carries a hypothesis, an executable code edit, an...
Autoencoders
Neural network autoencoders for unsupervised representation learning - undercomplete, denoising, sparse, contractive variants with PyTorch on MNIST, anomaly detection, and sparse autoencoders for LLM interpretability.
AutoGen Conversational Agents
Microsoft AutoGen v0.4 - event-driven multi-agent runtime, AgentChat teams, code execution, and production patterns for conversational AI systems.
AutoGen Deep Dive
Microsoft AutoGen v0.4: async conversational multi-agent systems, actor model architecture, group chat patterns, and MagenticOne.
Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM
This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks usi...
Automated Retraining Pipelines
Build fully automated trigger-based model retraining pipelines - from drift detection through training to production deployment, with human-in-the-loop approval.
Automatically Discovering How Misogyny is Framed on Social Media.
Automatically Discovering How Misogyny is Framed on... - published at NAACL 2025.
Automating Database-Native Function Code Synthesis with LLMs
Database systems incorporate an ever-growing number of functions in their kernels (a.k.a., database native functions) for scenarios like new application...
Autoregressive Decoding
Understand how LLMs generate tokens one at a time, why decoding is memory-bandwidth bound, and how to reason about inference latency with the roofline model.
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scien...
Autoscaling ML Workloads
Horizontal Pod Autoscaler, KEDA event-driven autoscaling for GPU metrics, zero-downtime rolling updates with readiness gates, and autoscaling patterns for production ML serving.
AUTOSUMM: A Comprehensive Framework for LLM-Based Conversation Summarization.
AUTOSUMM: A Comprehensive Framework for LLM-Based Co... - published at ACL 2025.
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization
We introduce AvatarPointillist, a novel framework for generating dynamic 4D Gaussian avatars from a single portrait image. At the core of our method is...
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmark...
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing....
AWQ In-Depth
How Activation-aware Weight Quantization protects salient weights to achieve near-lossless INT4 compression, and how to deploy AWQ models with AutoAWQ and vLLM.
AWQ: Activation-Aware Weight Quantization
AWQ protects the 1% of weights that matter most - how activation statistics reveal salient weights, how scaling preserves them without extra memory, why AWQ outperforms GPTQ at INT4 for production inference, and how to configure Marlin kernels for maximum throughput.
AWS Data Services
S3, Glue, Athena, EMR, and the AWS data engineering ecosystem.
AWS SageMaker for MLOps
Master the complete AWS SageMaker ecosystem for end-to-end ML workflows - training jobs, pipelines, model registry, feature store, and production inference at scale.
AWS Trainium and Inferentia
Deep dive into AWS custom AI chips - Trainium for training and Inferentia for inference, NeuronCore-v2 architecture, the Neuron SDK compilation pipeline, and real-world cost-performance tradeoffs versus GPU instances.
Axolotl and TRL Training Frameworks
Using Axolotl and HuggingFace TRL for LoRA and QLoRA fine-tuning - configuration files, SFTTrainer, DPO training, and distributed multi-GPU fine-tuning setups.
Azure ML for MLOps
Master the Azure Machine Learning platform for enterprise ML workflows - workspaces, component-based pipelines, managed endpoints, MLflow integration, and responsible AI.
Back to Repair: A Minimal Denoising Network\ for Time Series Anomaly Detection
We introduce JuRe (Just Repair), a minimal denoising network for time series anomaly detection that exposes a central finding: architectural complexity...
Back-of-the-Envelope Estimation for ML Systems
How to estimate storage, compute, memory, and infrastructure requirements for ML systems before writing a line of code - including the 6PD training compute rule and model sizing.
Backdoor Attacks on Decentralised Post-Training
Decentralised post-training of large language models utilises data and pipeline parallelism techniques to split the data and the model. Unfortunately, d...
Backpropagation From Scratch
Full chain rule derivation on computational graphs, Jacobian matrices and vector-Jacobian products, reverse-mode vs forward-mode autodiff, numpy 3-layer MLP implementation, PyTorch custom autograd Functions, and numerical gradient checking - every concept a senior engineer needs to debug, extend, and explain backprop under pressure.
Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO
Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models...
Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study
Deep learning in cardiac MRI (CMR) is fundamentally constrained by both data scarcity and privacy regulations. This study systematically benchmarks thre...
Batch Inference Pipelines
Designing efficient batch inference pipelines for scoring millions of examples - architecture, GPU utilization, failure recovery, and production patterns.
Batch Normalization
Batch normalization mechanics, train vs eval mode pitfalls, loss landscape smoothing theory, Layer Norm, Group Norm, Instance Norm, RMS Norm, pre-norm vs post-norm in transformers, and production freeze patterns - with full PyTorch implementations.
Batch Normalization for Neural Networks on Complex Domains
Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of pri...
Batch Orchestration Patterns for ML Pipelines
How to orchestrate complex batch ML pipelines with Airflow and modern alternatives, eliminating cron's silent failures, missing dependencies, and zero visibility.
Batch Processing with LLMs
Efficiently processing large document sets with LLM batch APIs - Anthropic Batch API, cost optimization, monitoring, checkpointing, and production patterns for overnight and large-scale LLM workloads.
Batch Processing with Spark for ML Pipelines
How Apache Spark processes terabyte-scale training data - architecture, DataFrames, partitioning, joins, and integration with Delta Lake for ML feature engineering.
Batched Kernelized Bandits: Refinements and Extensions
In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a b...
Batching Strategies for Inference
How static, dynamic, and continuous batching work - and how to go from 20% GPU utilization to 85% without increasing latency.
Batching Strategies for LLM Serving
Static batching, dynamic batching, continuous batching, chunked prefill, and prefill-decode disaggregation for LLM inference throughput and latency optimization.
Bayesian Additive Distribution Regression
Distribution regression, where the goal is to predict a scalar response from a distribution-valued predictor, arises naturally in settings where observa...
Bayesian Linear Regression - Uncertainty Estimates for Every Prediction
How placing a prior on linear regression weights gives a full posterior distribution over predictions - with closed-form solutions, predictive uncertainty, and connections to ridge regression.
Bayesian Neural Networks - Uncertainty Quantification for Deep Learning
How to place priors on neural network weights and approximate the posterior with variational inference or Monte Carlo dropout - with production trade-offs.
Bayesian Optimisation - Efficient Hyperparameter Search and Black-Box Optimization
How Bayesian Optimisation uses Gaussian Processes and acquisition functions to find near-optimal hyperparameters in far fewer evaluations than grid or random search - with full Python implementation using BoTorch and Optuna.
Bayesian X-Learner: Calibrated Posterior Inference for Heterogeneous Treatment Effects under Heavy-Tailed Outcomes
Conditional Average Treatment Effect (CATE) estimation in practice demands three properties simultaneously: heterogeneous effects $τ(x)$, calibrated unc...
Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior
Brain-wide recordings of large-scale networks of neurons now provide an unprecedented view into how the brain drives behavior. However, brain activity c...
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5.
Benchmarking and Building Zero-Shot Hindi Retrieval... - published at NAACL 2025.
Benchmarking Compressed Models
How to systematically evaluate accuracy-efficiency tradeoffs in quantized, pruned, and distilled models - perplexity, task-specific capabilities, latency, throughput, and automated regression detection.
Benchmarking Local Model Performance
Measuring local LLM inference speed - tokens per second, time to first token, memory usage, and systematic comparison across quantization levels, models, and hardware configurations.
Benchmarks: MMLU, HumanEval, and HELM
Navigate the LLM benchmark ecosystem - what each benchmark actually measures, saturation, contamination, and how to build benchmarks that can't be gamed.
Benchmarks: WebArena and OSWorld
Understanding computer use agent benchmarks - WebArena, OSWorld, ScreenSpot, Mind2Web. Current SOTA results, what the numbers mean, and how to evaluate your own agent.
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
Prior work shows that fine-tuning aligned models on benign data degrades safety in text and vision modalities, and that proximity to harmful content in...
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our wo...
BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understan...
Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback.
Beyond "Not Novel Enough": Enriching Schol... - published at EACL 2026.
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
In real-world Tool-Integrated Reasoning (TIR) scenarios, where LLMs interleave reasoning with external tool calls, a major source of inefficiency is tha...
Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed a...
Beyond Distribution Sharpening: The Importance of Task Rewards
Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their trainin...
Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces
Modern visual world modeling systems increasingly rely on high-capacity architectures and large-scale data to produce plausible motion, yet they often f...
Beyond Grid Search: Leveraging Bayesian Optimization for Accelerating RAG Pipeline Optimization.
Beyond Grid Search: Leveraging Bayesian Optimization... - published at EACL 2026.
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval
Transferring knowledge from a cross-encoder teacher via Knowledge Distillation (KD) has become a standard paradigm for training retrieval models. While...
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained...
Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation re...
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian...
Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from ru...
Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based edit...
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-s...
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k r...
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforc...
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods th...
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search
Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strate...
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models
Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integrat...
BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs
Transforming causal generative language models into bidirectional encoders offers a powerful alternative to BERT-style architectures. However, current a...
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remain...
BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
Automatic generation of executable Blender code from natural language remains challenging, with state-of-the-art LLMs producing frequent syntactic error...
BLEU, ROUGE, and Generation Metrics
Master reference-based generation metrics - BLEU, ROUGE, BERTScore, BLEURT - and know exactly when each one lies to you.
BLISSNet: Deep Operator Learning for Fast and Accurate Flow Reconstruction from Sparse Sensor Measurements
Reconstructing fluid flows from sparse sensor measurements is a fundamental challenge in science and engineering. Widely separated measurements and comp...
BMdataset: A Musicologically Curated LilyPond Dataset
Symbolic music research has relied almost exclusively on MIDI-based datasets; text-based engraving formats such as LilyPond remain unexplored for music...
BOOKCOREF: Coreference Resolution at Book Scale.
BOOKCOREF: Coreference Resolution at Book Scale. - published at ACL 2025.
BOOKMARKS: Efficient Active Storyline Memory for Role-playing
Memory systems are critical for role-playing agents (RPAs) to maintain long-horizon consistency. However, existing RPA memory methods (e.g., profiling)...
Boosting deep Reinforcement Learning using pretraining with Logical Options
Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed...
Boosting Visual Instruction Tuning with Self-Supervised Guidance
Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-gr...
BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning
Active learning (AL) aims to reduce annotation costs while maximizing model performance by iteratively selecting valuable instances. While foundation mo...
Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors
Yield Multi-Corner Analysis validates circuits across 25+ Process-Voltage-Temperature corners, resulting in a combinatorial simulation cost of $O(K im...
Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech.
Bridging Attribution and Open-Set Detection using Gr... - published at EACL 2026.
Browser Agents
Building practical browser agents using Playwright and LLMs - DOM manipulation, visual navigation, session management, anti-bot handling, and complete Python implementation.
Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evalu...
Build Systems and CI/CD for ML
How build systems and CI/CD pipelines keep ML projects reproducible, tested, and safely deployable - covering Make, Bazel, DVC, MLflow, GitHub Actions, and canary deployments.
Build vs Buy Analysis
A rigorous financial and risk framework for deciding when to build ML infrastructure in-house vs use managed services - applied to feature stores, vector DBs, LLMs, and more.
Build vs. Buy Economics for ML Tools
Economic analysis for ML tooling decisions - TCO framework, self-hosted vs. managed analysis, hidden costs of self-hosting, and a full financial case for W&B vs. MLflow.
Building an Evaluation Harness
Building a production evaluation harness for LLMs - lm-evaluation-harness architecture, custom task integration, CI/CD evaluation gates, versioned evaluation datasets, and automated regression detection.
Building an MCP Server
Hands-on guide to building a production-quality MCP filesystem server in Python using the official MCP SDK - complete with 4 tools, resources, MCP Inspector testing, and Claude Desktop integration.
Building Embedding Pipelines
Design production embedding pipelines - model selection, batch ingestion, incremental indexing, zero-downtime model upgrades, embedding drift detection, normalization, and dimensionality reduction.
Building Golden Datasets
Learn how to construct, annotate, validate, and maintain golden datasets that serve as the ground truth foundation for all AI system evaluation - covering annotation guidelines, inter-annotator agreement, adversarial generation, dataset versioning, and drift detection.
Building Your Own Coding Agent
Build a complete, functional coding agent from scratch in Python. Architecture decisions, repo maps, context management, system prompts, safety, and the full 500-line agent.
Bytecode Inspection - Inside the code Object
Understand Python bytecode and the code object at engineering depth - all co_ attributes explained, how .pyc files work, reading bytecode with marshal, the line number table, closures in bytecode, and practical uses in debuggers and test frameworks.
C and C++ for ML Systems
Learn why C and C++ form the foundation of every major ML framework, and how to read, write, and debug C++ code as an ML systems engineer.
C Extensions and FFI - When Python Isn't Fast Enough
Master ctypes, cffi, Cython, and pybind11 for calling C/C++ from Python - loading shared libraries, writing CPython extensions, and accelerating hot paths with compiled code.
C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion
We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative pr...
C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences
Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding more reliable judgments than single-model verification. H...
Caching for ML Serving
How to reduce ML serving cost and latency with result caching, semantic similarity caching, KV cache for transformers, prefix caching for LLMs, and feature caching with Redis.
Caching Strategies
Four caching layers for LLM applications - exact match, semantic similarity, provider prefix caching, and KV cache - with implementation patterns and production tradeoffs.
Caching Strategies - Trading Memory for Speed
Master functools.lru_cache, functools.cache, TTL caches, memoization patterns, cache invalidation, cachetools, Redis caching, and cache stampede prevention.
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
Speculative sampling (SpS) has been successful in accelerating the decoding throughput of auto-regressive large language models by leveraging smaller dr...
Can LLMs Learn to Reason Robustly under Noisy Supervision?
Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to...
Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?
Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of th...
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task...
Canary and Blue-Green Deployments for ML Models
Safe model rollout strategies - canary deployments for gradual traffic migration, blue-green for instant switch, and automated rollback triggers.
Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning.
Cards Against Contamination: TCG-Bench for Difficult... - published at EACL 2026.
Cascade and Funnel Architecture
How multi-stage ranking systems reduce millions of candidates to a final ranked list within strict latency budgets - the architecture behind every major search and recommendation system.
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid sepa...
Case Studies: Production LLM Systems
Five detailed production LLM architectures - GitHub Copilot, Notion AI, customer support bots, enterprise RAG, and code review agents - with real architecture decisions, scale numbers, and lessons learned.
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provid...
Causal Cellular Context Transfer Learning (C3TL): An Efficient Architecture for Prediction of Unseen Perturbation Effects
Predicting the effects of chemical and genetic perturbations on quantitative cell states is a central challenge in computational biology, molecular medi...
Causal Interpretation of Neural Network Computations with Contribution Decomposition
Understanding how neural networks transform inputs into outputs is crucial for interpreting and manipulating their behavior. Most existing approaches an...
Causal Language Modeling and GPT
Learn how GPT-style autoregressive models work, the evolution from GPT-1 to GPT-4, sampling strategies, and why causal LM became the dominant paradigm for LLMs.
Causality Elicitation from Large Language Models
Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal re...
Cerebras Wafer Scale Engine
How Cerebras builds the world's largest chip by using the entire silicon wafer as one device, eliminating inter-chip communication overhead for large model training and delivering linear scaling without distributed training frameworks.
Certified and accurate computation of function space norms of deep neural networks
Neural network methods for PDEs require reliable error control in function space norms. However, trained neural networks can typically only be probed at...
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information.
CFSP: An Efficient Structured Pruning Framework for... - published at COLING 2025.
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful paradigm for answering complex multi-hop questions by progressively retrieving...
Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. Howeve...
Chain-of-Thought Prompting
Learn how to unlock multi-step reasoning in LLMs by making them think out loud - and why this simple technique dramatically improves accuracy on complex tasks.
Chain-of-Thought Reasoning at Inference Time
How chain-of-thought prompting transforms model reasoning - from the Wei et al. 2022 breakthrough to self-consistency, process supervision, and the faithfulness problem.
Challenges of Evaluating Agents
Why evaluating agentic systems is fundamentally harder than evaluating static models - the multi-path problem, compound errors, latent failures, and how to build an evaluation mindset.
Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization
We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussi...
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows
Frontier coding agents are increasingly used in workflows where users supervise progress primarily through repeated improvement of a public score, namel...
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language
At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and cont...
Choosing an Orchestrator
A decision framework for selecting the right ML pipeline orchestrator - comparing Airflow, Prefect, Kubeflow Pipelines, Metaflow, ZenML, and Dagster across team size, maturity, and infrastructure requirements.
Choosing an Orchestrator for Your AI Data Stack
What Airflow, Prefect, Dagster, and Temporal each do for AI systems, when your ML pipeline complexity and team maturity dictate which orchestrator fits best, and how to apply a structured decision framework to select the right tool for production AI data pipelines.
Choosing Custom Silicon vs GPUs
A complete decision framework for AI accelerator selection - how to evaluate NVIDIA GPUs, TPUs, Trainium, Gaudi, Groq, and custom ASICs across workload fit, TCO, ecosystem maturity, and team capability.
Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text
We propose Chunk-wise Attention Transducer (CHAT), a novel extension to RNN-T models that processes audio in fixed-size chunks while employing cross-att...
CI/CD for ML
Build automated CI/CD pipelines for machine learning - from unit tests on transforms to canary deployments - so model degradation gets caught before it reaches users.
CI/CD for ML vs Software
Understand why standard software CI/CD is insufficient for ML and what additional stages you need to catch real failures.
CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
We address the problem of generating a 3D-consistent, navigable environment that is spatially grounded: a simulation of a real location. Existing video...
Classes and Objects - Python's Object Model at Engineering Depth
Understand Python classes and objects at the engineering level - class vs instance namespace, attribute resolution, type as metaclass, class body execution, and the shared mutable attribute trap.
Classifier-Free Guidance - Steering Diffusion with Text
Complete derivation of CFG from classifier guidance through the Ho-Salimans implicit classifier insight - the guidance scale trade-off, negative prompting mechanics, dynamic thresholding, CFG++ variants, and production sampling implementations.
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existi...
ClawArena: Benchmarking AI Agents in Evolving Information Environments
AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered a...
ClawBench: Can AI Agents Complete Everyday Online Tasks?
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unso...
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what...
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and ke...
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation
Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user. Human productivi...
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
Large language model (LLM) agents are increasingly deployed to automate productivity tasks (e.g., email, scheduling, document management), but evaluatin...
Clean Architecture - Dependencies Point Inward
Implement Uncle Bob's Clean Architecture in Python with proper layering, the dependency rule, domain models, service layers, repositories, and framework boundaries.
CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models
Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified mult...
Clinical NLP and EHR Systems
Building NLP pipelines on Electronic Health Records - named entity recognition for clinical text, negation detection, de-identification for HIPAA compliance, and fine-tuning BERT variants on medical corpora.
CLIP and Contrastive Learning
How CLIP uses contrastive learning on 400M image-text pairs to build a shared semantic embedding space - enabling zero-shot classification without labeled data.
CLoPA: Continual Low Parameter Adaptation of Interactive Segmentation for Medical Image Annotation
Interactive segmentation enables clinicians to guide annotation, but existing zero-shot models like nnInteractive fail to consistently reach expert-leve...
Closures Deep Dive - Free Variables, Cell Objects, and nonlocal
Master Python closures at CPython depth - free variables, cell objects, __closure__, co_freevars, the UnboundLocalError trap, the nonlocal keyword, late binding, factory functions, memoization, and when to use a closure vs a class.
Cloud Cost Management
Implement full FinOps practice for ML teams - from commitment-based discounts and tagging strategies to budget alerts and spot instance automation.
Cloud FinOps for ML
Financial operations for ML cloud spend - FinOps maturity model, reserved instances, spot strategy, multi-account cost attribution, and ML budget forecasting.
Cloud ML Cost Optimization
Master cloud cost management for ML workloads - spot instance strategies, storage optimization, inference cost reduction, FinOps tooling, and real-world cost reduction from $80K to $31K/month.
Cloud vs On-Prem GPU Infrastructure
Total cost of ownership analysis for cloud GPU instances vs on-premises clusters, break-even analysis, spot instance economics, Kubernetes GPU scheduling, and FinOps strategies for GPU compute at scale.
CNN Architectures - AlexNet to ResNet, EfficientNet, and ConvNeXt
The full evolution of CNN architectures from handcrafted features to AlexNet, VGG, GoogLeNet, ResNet, EfficientNet, and ConvNeXt - with the engineering story behind every breakthrough.
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the cha...
CocoaBench: Evaluating Unified Digital Agents in the Wild
LLM agents now perform strongly in software engineering, deep research, GUI automation, and various other applications, while recent agent scaffolds and...
Code and Math Specialized Models
How domain-specific pre-training and fine-tuning on code and math data produces models that outperform general LLMs on programming and reasoning tasks - and when to use them in production.
Code Coverage - Measuring What You Test (and What You Miss)
Master code coverage at engineering depth - line vs branch vs condition coverage, coverage.py internals with sys.settrace, pytest-cov, .coveragerc configuration, pragma no cover, coverage in CI, and mutation testing with mutmut to find tests that pass but don't catch bugs.
Code Generation Evaluation
Evaluating LLMs on code generation tasks - HumanEval, MBPP, LiveCodeBench, SWE-bench, pass@k metric, EvalPlus, execution-based evaluation, security testing, and building sandboxed evaluation environments.
Code World Model Preparedness Report
This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducte...
Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers
Code-switching is a pervasive linguistic phenomenon in global communication, yet modern information retrieval systems remain predominantly designed for,...
CodeGenWrangler: Data Wrangling task automation using Code-Generating Models.
CodeGenWrangler: Data Wrangling task automation usin... - published at NAACL 2025.
CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts.
CodeTaxo: Enhancing Taxonomy Expansion with Limited... - published at ACL 2025.
CodeTracer: Towards Traceable Agent States
Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage...
Coevolving Representations in Joint Image-Feature Diffusion
Joint image-feature generative modeling has recently emerged as an effective strategy for improving diffusion training by coupling low-level VAE latents...
Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems
Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of LLMs, yet a fundamental limitation remains: models cannot...
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots.
Cognitive Kernel: An Open-source Agent System toward... - published at NAACL 2025.
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Synthesizing human--object interaction (HOI) videos has broad practical value in e-commerce, digital advertising, and virtual marketing. However, curren...
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a funda...
Collaborative Filtering - How Netflix Knows You Better Than You Know Yourself
Learn how user-based and item-based collaborative filtering work from first principles - the math behind cosine similarity and Pearson correlation, how Amazon's item-to-item CF changed the industry, and how to build production-grade recommendation engines.
Collective Kernel EFT for Pre-activation ResNets
In finite-width deep neural networks, the empirical kernel $G$ evolves stochastically across layers. We develop a collective kernel effective field theo...
Combee: Scaling Prompt Learning for Self-Improving Language Model Agents
Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter ch...
ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are gene...
Comparing and Selecting Models
Systematic model comparison and selection - metric design, statistical significance testing, champion-challenger frameworks, and making defensible production promotion decisions.
Comparing Classical and Quantum Variational Classifiers on the XOR Problem
Quantum machine learning applies principles such as superposition and entanglement to data processing and optimization. Variational quantum models opera...
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performanc...
Competition-Aware CPC Forecasting with Near-Market Coverage
Cost-per-click (CPC) in paid search is a volatile auction outcome generated by a competitive landscape that is only partially observable from any single...
Complexity Analysis for ML Engineers
Learn how Big-O notation, time and space complexity, and amortized analysis apply directly to ML systems - from understanding why O(n^2) attention broke transformers to profiling GPU kernels.
Compliance Monitoring Systems
Regulatory change detection, gap analysis automation, policy compliance checking, and building AI systems that track regulatory requirements across jurisdictions.
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models
Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elici...
Composition vs Inheritance - When to Use Each at Engineering Depth
Master the is-a vs has-a distinction, understand why "favour composition over inheritance" exists, implement the delegation pattern, use mixins, refactor inheritance to composition, and apply dependency injection with typing.Protocol for structural typing.
Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern...
Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations.
Compress to Impress: Unleashing the Potential of Com... - published at COLING 2025.
Computer Use Architecture
How Anthropic's Computer Use API works - the screenshot-action loop, the three tools, coordinate systems, and building a working computer use agent with Docker.
Computer Vision for Quality Control
Learn how AI-powered visual inspection systems detect manufacturing defects using anomaly detection, semantic segmentation, and real-time inline inspection pipelines.
Computer Vision Systems
Production computer vision at scale - autonomous vehicle perception with 30 cameras at 100Hz, real-time object detection, model compression for edge, active learning, and quality metrics.
Computing Equilibrium beyond Unilateral Deviation
Most familiar equilibrium concepts, such as Nash and correlated equilibrium, guarantee only that no single player can improve their utility by deviating...
Concentration and Calibration in Predictive Bayesian Inference
Predictive Bayesian inference (PBI) represents a model-and prior-agnostic approach to standard Bayesian inference which allows users to quantify uncerta...
Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding
Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word or...
Concurrency Primitives
Master mutexes, condition variables, atomics, lock-free programming, and thread pools - the concurrency building blocks behind every high-throughput ML data pipeline and inference server.
Conditioning Protein Generation via Hopfield Pattern Multiplicity
Protein sequence generation via stochastic attention produces plausible family members from small alignments without training, but treats all stored seq...
Configuration Management - Environment-Driven Apps
Externalize and validate application configuration with python-dotenv, pydantic-settings, secrets management, multi-environment configs, and the 12-factor config principle.
Conformal Prediction - Distribution-Free Uncertainty with Guaranteed Coverage
Conformal prediction constructs prediction sets with provable finite-sample coverage guarantees under only the exchangeability assumption - no distributional assumptions required. Complete Python implementation for classification and regression.
Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity
Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at unde...
CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
As large language models (LLMs) are increasingly deployed as autonomous agents, understanding how strategic behavior emerges in multi-agent environments...
Consistency and Availability in ML Systems
How CAP theorem, eventual consistency, and training-serving skew apply to ML systems - feature stores, model versioning, multi-region serving, and when consistency actually matters.
Constitutional AI
How Anthropic replaced human feedback with AI feedback guided by explicit principles - the Constitutional AI technique, RLAIF, and how it enables scalable alignment.
Constrained Decoding - How It Works
The mathematics of constrained decoding - finite-state machines, token masking, context-free grammars, and how the Outlines library achieves guaranteed JSON schema conformance at generation time.
Container Registry and CI
Manage ML container images in CI/CD pipelines - registry choices, image tagging, multi-architecture builds, Trivy scanning, and environment promotion workflows.
Containers and Namespaces
How Linux namespaces, cgroups, and overlay filesystems power container isolation for multi-tenant ML serving, GPU workloads, and reproducible training environments.
Content Generation for Education
Learn how LLMs generate educational content - questions, explanations, worked examples, and quizzes - with quality control, Bloom's taxonomy alignment, and hallucination mitigation.
Content-Based Filtering - Recommending by What Items Are Made Of
Learn how content-based filtering builds item feature vectors, constructs user profiles, and scores unseen items using TF-IDF and cosine similarity - no user overlap required.
Context Compression Techniques
How LLMLingua, AutoCompressors, GIST tokens, and selective compression reduce long contexts to fewer tokens while preserving the information needed to answer queries.
Context Management at Scale
Managing context windows, conversation history, and state across sessions - sliding window, summarization compression, hierarchical context, KV cache management, and context budget allocation for production LLM systems.
Context Unrolling in Omni Models
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representati...
Context Window Extension - YaRN, LongRoPE, LongLoRA
How position interpolation, NTK-aware scaling, YaRN, and LongLoRA extend pretrained models to context windows far beyond their original training length.
Context Window Management
Engineering strategies for managing context windows in production LLM applications - history truncation, compression, RAG ordering, and prompt caching design.
Context-Value-Action Architecture for Value-Driven Large Language Model Agents
Large Language Models (LLMs) have shown promise in simulating human behavior, yet existing agents often exhibit behavioral rigidity, a flaw frequently m...
Continual Learning and Domain Adaptation
Learn how to adapt open-source language models to specialized domains through continual pre-training, manage catastrophic forgetting with EWC and data mixing, and evaluate domain knowledge gain versus general capability loss.
Continuous Adversarial Flow Models
We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which u...
Continuous Batching
Learn how continuous batching eliminates GPU idle time by replacing finished sequences immediately rather than waiting for the longest request in a batch to complete.
Continuous Eval in CI/CD
Design and implement a full CI/CD pipeline for AI systems - covering PR-level linting, merge-level regression, pre-deployment evaluation gates, production monitoring with statistical process control, anomaly detection, automated rollback, and observability tracing from query to feedback.
Continuous Latent Diffusion Language Model
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed l...
Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet
The Tactile Internet demands sub-millisecond latency and ultra-high reliability, as high latency or packet loss could lead to haptic control instability...
Continuous Training
Design continuous training systems that safely update models every few hours - covering CT maturity levels, warm-starting, failure modes, and monitoring.
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency...
Contract Analysis and NLP
Clause extraction, obligation detection, risk identification, and building NLP systems for commercial contract analysis at law firm and enterprise scale.
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks
Interpretability tools are increasingly used to analyze failures of Large Language Models (LLMs), yet prior work largely focuses on short prompts or toy...
Controllable Style Arithmetic with Language Models.
Controllable Style Arithmetic with Language Models. - published at ACL 2025.
Convergent Evolution: How Different Language Models Learn Similar Number Representations
Language models trained on natural text learn to represent numbers using periodic features with dominant periods at T=2, 5, 10. In this paper, we identi...
Convolutional Neural Networks
From first principles - why CNNs exist, how the convolution operation works, weight sharing, hierarchical feature learning, receptive fields, 1x1 convolutions, and depthwise separable convolutions with PyTorch.
Cortex 2.0: Grounding World Models in Real-World Industrial Deployment
Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Langu...
Cost and Performance Trade-offs in Data Infrastructure
How to reason about the latency-throughput-cost triangle, diagnose expensive Spark jobs, optimize cloud data costs with partitioning and caching, and fix data skew that silently kills pipeline performance.
Cost Attribution and Accountability
Making ML teams own their costs - tagging strategy, per-model cost dashboards, chargeback model design, cost anomaly detection, and engineering incentives for cost efficiency.
Cost Management and Budget Alerts
Track LLM spend per user, team, and feature in real time. Enforce hard budget limits and trigger alerts before costs spiral - because the invoice arrives 30 days too late.
Cost Optimization Patterns
Practical LLM cost reduction - semantic caching, model routing, prompt compression, Anthropic prompt caching, output length control, cost attribution, and monitoring for production AI systems.
Counterfactual Evaluation
Evaluate new ML policies using logged data from an old policy - inverse propensity scoring, doubly robust estimators, and offline policy evaluation for when A/B tests are too expensive.
Counterfactual Explanations - What Would Have to Change for a Different Decision?
Counterfactual explanations answer 'what would need to change?' - the most actionable form of ML explanation, and the basis for GDPR compliance in automated decision-making.
Counting as a minimal probe of language model reliability
Large language models perform strongly on benchmarks in mathematical reasoning, coding and document analysis, suggesting a broad ability to follow instr...
Counting to Four is still a Chore for VLMs
Vision--language models (VLMs) have achieved impressive performance on complex multimodal reasoning tasks, yet they still fail on simple grounding skill...
Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline
Identifying the full landscape of small and medium-sized enterprises (SMEs) in specialized industry sectors is critical for supply-chain resilience, yet...
CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization
Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen...
cProfile and pstats - Function-Level Profiling
Master deterministic profiling with cProfile and pstats - reading profile output, sorting and filtering results, snakeviz visualization, profiling overhead, and real-world endpoint profiling.
CPU Memory Architecture for ML
How CPU memory hierarchy - L1/L2/L3 caches, DRAM, and NUMA topology - shapes ML data pipelines, DataLoader performance, and large model loading strategies on multi-socket servers.
CPU Pipeline and Instruction Execution
Learn how modern CPUs execute billions of instructions per second through pipelining, out-of-order execution, branch prediction, and superscalar design - and why these details matter for every ML engineer.
CPython Architecture - The Interpreter at Engineering Depth
Understand CPython's architecture at engineering depth - the execution pipeline, the eval loop, PyObject memory layout, integer caching, string interning, the small object allocator, and alternative Python implementations.
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on kno...
Craw4LLM: Efficient Web Crawling for LLM Pretraining.
Craw4LLM: Efficient Web Crawling for LLM Pretraining. - published at ACL 2025.
CreativeGame:Toward Mechanic-Aware Creative Game Generation
Large language models can generate plausible game code, but turning this capability into iterative creative improvement remains difficult. In practice,...
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing
Recent advances in large language models have led to strong performance on reasoning and environment-interaction tasks, yet their ability for creative p...
CrewAI
CrewAI v0.80+: role-based multi-agent systems with Crew, Agent, Task, Process, and Flow - the most production-friendly multi-agent framework.
CrewAI Multi-Agent Systems
CrewAI in production - agents, tasks, crews, memory systems, Flows, and deep-dive patterns for role-based multi-agent pipelines.
Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video
Talking face generation has gained significant attention as a core application of generative models. To enhance the expressiveness and realism of synthe...
Cross-Session Persistence
How to build agents whose memory survives restarts - architecture, storage backends, session restoration, and privacy-aware memory pruning for production systems.
Cross-Tokenizer LLM Distillation through a Byte-Level Interface
Cross-tokenizer distillation (CTD), the transfer of knowledge from a teacher to a student language model when the two use different tokenizers, remains...
Crowded in B-Space: Calibrating Shared Directions for LoRA Merging
Merging separately trained LoRA adapters is a practical alternative to joint multi-task training, but it often hurts performance. Existing methods usual...
Cryptographic Hashing
Master data hashing vs password hashing - hashlib, bcrypt, argon2, salting, timing attacks, constant-time comparison, and why MD5/SHA1 are broken for passwords.
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation
Camera-controllable video generation aims to synthesize videos with flexible and physically plausible camera movements. However, existing methods either...
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong p...
CUDA Programming Model
Learn the CUDA programming model from first principles - host vs device execution, kernel launch syntax, the NVCC compilation pipeline, and how to write and compile your first GPU kernel from Python using torch.utils.cpp_extension.
CUDA Streams and Async Execution
Learn how CUDA streams enable concurrent GPU execution, how to overlap data transfers with computation using double buffering, how CUDA events work for synchronization and timing, and how PyTorch streams integrate with training pipelines for maximum throughput.
CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
As language models shift from single-shot answer generation toward multi-step reasoning that retrieves and consumes evidence mid-inference, evaluating t...
CUFE@NLU of Devanagari Script Languages 2025: Language Identification using fastText.
CUFE@NLU of Devanagari Script Languages 2025: Langua... - published at COLING 2025.
CUFE@VarDial 2025 NorSID: Multilingual BERT for Norwegian Dialect Identification and Intent Detection.
CUFE@VarDial 2025 NorSID: Multilingual BERT for Norw... - published at COLING 2025.
CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-interse...
Custom Awaitables
Build awaitable objects with the __await__ protocol, understand how coroutines and Futures work under the hood, and create custom async primitives.
Custom Data Monitoring
Building custom monitoring with Great Expectations and statistical tests.
Customer Lifetime Value
CLV prediction with BG/NBD probabilistic models, Gamma-Gamma monetary value, deep learning on purchase sequences, RFM segmentation, and the ML systems that drive acquisition and retention budget decisions.
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, pat...
CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation
Self-supervised surround-view depth estimation enables dense, low-cost 3D perception with a 360° field of view from multiple minimally overlapping image...
Cython and C Extensions
Learn how Cython bridges Python and C to deliver C-level performance in Python projects, covering type declarations, typed memoryviews, OpenMP parallelism, and raw C extension modules.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterpa...
Dagster for Data Assets
Asset-based orchestration, software-defined assets, and Dagster's lineage model.
DARE - Delta Weight Sparsification
How DARE randomly drops delta weights and rescales the remainder to dramatically reduce interference when merging multiple fine-tuned models.
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor
Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token...
DASR: Distributed Adaptive Scene Recognition - A Multi-Agent Cloud-Edge Framework for Language-Guided Scene Detection.
DASR: Distributed Adaptive Scene Recognition - A Mul... - published at EMNLP 2025.
Data Catalog and Discovery
Apache Atlas, DataHub, Amundsen - cataloguing data for ML teams.
Data Collection Strategy - Building the Moat Before Training the Model
Learn how to design data collection and labeling strategies that determine a model's fate before a line of training code is written - the most underestimated skill in ML engineering.
Data Contracts
Enforcing data quality agreements between producers and consumers - schema contracts with Pandera and Great Expectations, statistical contracts, SLA contracts, CI integration, and violation alerting.
Data Drift Detection
Detecting when input data distributions change in production - KS test, PSI, chi-squared, Wasserstein distance, MMD, univariate vs. multivariate drift, reference window selection, and EvidentlyAI.
Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving
Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving...
Data Engineering with Python
The complete Python toolkit for data engineering - pandas memory optimization, PyArrow columnar processing, DuckDB analytical SQL, Polars lazy evaluation, and pipeline testing with pandera.
Data Governance for AI Training Datasets
What column-level security, data lineage, and cataloguing do for AI systems, when regulated AI training data requires auditability and access controls across the lakehouse, and how to implement governance with Apache Atlas and Unity Catalog in production AI data pipelines.
Data Incident Management
Runbooks, on-call rotations, and root cause analysis for data incidents.
Data Lake and Data Warehouse for ML
The evolution from database to data lake to lakehouse - when to use each storage architecture for ML training data, feature engineering, and model serving.
Data Lake vs Warehouse vs Lakehouse for AI Workloads
What each storage architecture does for AI systems, when ML teams need both raw unstructured data and structured query access on the same platform, and how to choose and implement the right architecture in production AI data pipelines.
Data Lineage
Column-level lineage, impact analysis, and tools like OpenLineage and DataHub.
Data Modelling for ML
How to design data models for machine learning - point-in-time correctness, entity-centric tables, SCD Type 2, label leakage prevention, and the training-serving skew problem.
Data Pipeline Patterns for AI/ML Workflows
ETL vs ELT, Lambda vs Kappa architecture, idempotency, exactly-once semantics, backfill strategies, watermarking for late data, and how to design pipelines that reliably serve both model training and real-time inference.
Data Platform Cost Optimisation for AI Teams
What query optimisation, storage tiering, and cloud cost controls do for AI systems, when large-scale model training and feature computation drive unpredictable cloud spend, and how to implement cost reduction strategies in production AI data pipelines.
Data Poisoning
Attacks that corrupt training or fine-tuning data to embed backdoors, trigger unexpected behaviors, or degrade model performance in production.
Data Quality and Filtering
Systematic approaches to filtering synthetic data for quality, diversity, safety, and alignment - the layered pipeline that separates fine-tuned models that work from models that regress.
Data Quality and Validation for ML
Why data quality is the number-one cause of ML failures in production - Great Expectations, data contracts, PSI distribution monitoring, and pipeline quality gates.
Data Serialization and Schemas
Why serialization format is an architectural decision - JSON vs Protocol Buffers vs Avro, schema evolution strategies, and how Confluent Schema Registry prevents breaking production pipelines.
Data Structures for ML Systems
Data structures for ML infrastructure - trie for tokenizers, HNSW for vector search, inverted index for retrieval, LSM trees for feature stores, and product quantization for memory-efficient vector storage.
Data Systems for ML - The Foundation Layer
The complete ML data stack - from raw storage through feature engineering to model training and serving, including data lakes, warehouses, lakehouses, and temporal joins.
Data Versioning with Delta Lake
ACID transactions, time travel, schema evolution, and training data versioning with Delta Lake - building reproducible ML pipelines on object storage.
Data-Efficient Non-Gaussian Semi-Nonparametric Density Estimation for Nonlinear Dynamical Systems
Accurate representation of non-Gaussian distributions of quantities of interest in nonlinear dynamical systems is critical for estimation, control, and...
Databricks
Databricks Lakehouse, Unity Catalog, MLflow integration, and AutoML.
Databricks for MLOps
Master the Databricks Lakehouse platform for ML - Delta Lake, Unity Catalog, Feature Store, MLflow Model Registry, Model Serving, and Spark-scale feature pipelines for production ML.
Dataclasses - Code Generation, Immutability, and Production Patterns
Master Python's @dataclass decorator at engineering depth - what it generates, field() and default_factory, frozen=True for immutability, __post_init__ for validation, ClassVar vs InitVar, inheritance with dataclasses, ordering, and production patterns in FastAPI and config systems.
Dataset Curation for Fine-Tuning
How to build high-quality fine-tuning datasets - sourcing, deduplication, quality filtering, LLM-as-judge scoring, and a complete curation pipeline. Why 5K curated examples beat 500K raw ones.
Dataset Lineage and Management
Tracking dataset provenance, preventing train/val/test leakage, stratified splitting, dataset registries, and discovering the CV team's 12% accuracy inflation from augmentation leakage.
DBSCAN and Density-Based Clustering
Master DBSCAN, OPTICS, HDBSCAN, and Mean Shift - density-based clustering algorithms that discover arbitrarily shaped clusters, handle varying densities, and identify anomalies without specifying the number of clusters.
dbt Advanced Patterns for ML Teams
Advanced dbt techniques for large-scale ML pipelines - snapshots for SCD2, point-in-time correct features, slim CI, dbt-utils macros, and production deployment patterns.
dbt for ML Feature Preparation
How dbt brings lineage, testing, documentation, and version control to SQL-based ML data pipelines, replacing fragile cron-driven script chains.
DDIM and Accelerated Diffusion Sampling
How DDIM reduces 1000-step DDPM sampling to 10-50 steps via a non-Markovian process, the eta parameter, DDIM inversion for image editing, and DPM-Solver as the current production standard.
DDPMs - The Mathematical Foundation of Diffusion Models
The complete mathematical derivation of Denoising Diffusion Probabilistic Models - forward process, reverse process, ELBO objective, noise schedule comparison, U-Net architecture, and why predicting noise works better than predicting clean images.
Debate and Critique Patterns
How LLMs critiquing each other improves quality: verifier/critic patterns, multi-agent debate, ensemble approaches, and convergence detection.
Decentralized Proximal Stochastic Gradient Langevin Dynamics
We propose Decentralized Proximal Stochastic Gradient Langevin Dynamics (DE-PSGLD), a decentralized Markov chain Monte Carlo (MCMC) algorithm for sampli...
Decentralized Ranking Aggregation: Gossip Algorithms for Borda and Copeland Consensus
The concept of ranking aggregation plays a central role in preference analysis, and numerous algorithms for calculating median rankings, often originati...
Decorators - Wrapping Callables at Engineering Depth
Master Python decorators at full engineering depth - functools.wraps, decorator factories with three-level nesting, class-based decorators, stacking order, production patterns (timing, retry, caching, rate limiting), and how FastAPI/Flask route decorators work under the hood.
Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing
In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact real...
Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects
Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context...
Deep ensemble graph neural networks for probabilistic cosmic-ray direction and energy reconstruction in autonomous radio arrays
Using advanced machine learning techniques, we developed a method for reconstructing precisely the arrival direction and energy of ultra-high-energy cos...
Deep Q-Networks (DQN)
Scale Q-learning to high-dimensional inputs with neural networks. Learn the DQN architecture, experience replay, target networks, Double DQN, Dueling DQN, Prioritized Experience Replay, and Rainbow. Full PyTorch implementation included.
DeepSeek MoE Architecture
DeepSeek's innovations in mixture of experts - fine-grained experts, shared experts, DeepSeek-V2 and V3, multi-token prediction, and training for $6M.
DeepSeek-R1 - Open Source Reasoning
How DeepSeek built an open-weights reasoning model using pure RL with GRPO, the R1-Zero experiment, distillation into smaller models, and what open-source reasoning means for the research community.
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
Transformer models are widely deployed in critical AI applications, yet faults in their attention mechanisms, projections, and other internal components...
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by...
Delta Lake
Delta Lake on Databricks, merge operations, and Change Data Capture.
Delta Lake and Iceberg for ML
Delta Lake as ML data infrastructure - ACID transactions, time travel, schema evolution, Delta + MLflow integration, OPTIMIZE/Z-ordering, and handling schema changes without breaking pipelines.
Demand Forecasting Systems
Hierarchical time series forecasting at retail scale - classical methods, gradient boosting, deep learning with TFT, and the engineering behind forecasting millions of SKUs in real time.
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling.
DEMO: Reframing Dialogue Interaction with Fine-grain... - published at ACL 2025.
Demystifying When Pruning Works via Representation Hierarchies
Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However...
DeonticBench: A Benchmark for Reasoning over Rules
Reasoning with complex, context-specific rules remains challenging for large language models (LLMs). In legal and policy settings, this manifests as deo...
Dependency Injection - Decoupling Components
Master dependency injection in Python from manual constructor injection to DI containers and FastAPI Depends, with testing strategies and architectural trade-offs.
Dependency Management and Packaging
Master Python packaging from pyproject.toml and uv to Docker layer caching, private registries, and the CUDA version compatibility matrix that determines whether your ML environment actually works.
Deploying Quantized Models in Production
End-to-end guide for production deployment of quantized LLMs - format selection, serving stack configuration, latency SLAs, A/B testing, quality monitoring, and rollback strategy.
Descriptors - The Protocol That Powers Python's Object Model
Master the descriptor protocol - __get__, __set__, __delete__, data vs non-data descriptors, the complete attribute lookup algorithm, and how property, classmethod, staticmethod, and bound methods work under the hood.
Design Experiments to Compare Multi-armed Bandit Algorithms
Online platforms routinely compare multi-armed bandit algorithms, such as UCB and Thompson Sampling, to select the best-performing policy. Unlike standa...
Design Patterns in Python - Idiomatic Implementations for Production Code
Master the most important GoF design patterns in idiomatic Python - Singleton, Factory, Abstract Factory, Strategy, Observer, Decorator, Registry, and Builder. For each - GoF intent, Pythonic implementation, and real framework usage.
Designing a Content Moderation System
End-to-end design of a large-scale content moderation system - covering multi-modal ML pipelines, human review integration, active learning, adversarial robustness, and platform-scale architecture.
Designing a Fraud Detection System at Scale
End-to-end design of a real-time fraud detection system - covering feature engineering, imbalanced learning, streaming scoring, delayed labels, and graph-based fraud ring detection.
Designing a Recommendation System at Scale
End-to-end design of a recommendation system serving billions of items to millions of users - covering two-stage architecture, candidate generation, ranking, cold start, and serving at scale.
Designing a Search Ranking System
End-to-end design of a production search ranking system - covering query understanding, BM25 + dense retrieval, Learning to Rank, semantic reranking, and A/B testing metrics.
Detecting and Suppressing Reward Hacking with Gradient Fingerprints
Reinforcement learning with verifiable rewards (RLVR) typically optimizes for outcome rewards without imposing constraints on intermediate reasoning. Th...
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object...
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo
Achieving human-level manipulation requires dexterous robotic hands capable of complex object interactions. Advancing such capabilities further demands...
DGX and HGX System Design
NVIDIA DGX H100 and HGX reference designs - 8-GPU NVLink mesh, NVSwitch fabric, PCIe host bridge, ConnectX InfiniBand, power and cooling requirements, DGX SuperPOD scale-out, and topology-aware NCCL configuration for maximum distributed training throughput.
Different Time, Different Language: Revisiting the Bias Against Non-Native Speakers in GPT Detectors.
Different Time, Different Language: Revisiting the B... - published at EACL 2026.
Differentiable Zero-One Loss via Hypersimplex Projections
Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enablin...
DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction
Neural representations (NRs), such as neural fields and 3D Gaussians, effectively model volumetric data in computed tomography (CT) but suffer from seve...
Diffusion Model as a Generalist Segmentation Learner
Diffusion models are primarily trained for image synthesis, yet their denoising trajectories encode rich, spatially aligned visual priors. In this paper...
Diffusion Models
How denoising diffusion models learn to reverse a Gaussian noise process to generate high-quality images from text prompts.
Diffusion Models Beyond Images - Audio, Video, 3D, Molecules, Text
How the diffusion framework generalizes across modalities - from waveform audio synthesis to protein structure prediction, video generation, 3D scene creation, time series, and text - with the architectural changes each domain requires.
Digital Twins and Simulation
Learn how digital twins combine physics-based simulation with machine learning to create virtual replicas of manufacturing systems for prediction, optimization, and what-if analysis.
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain
Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain rem...
Direct Bayesian Additive Regression Trees for Conditional Average Treatment Effects in Regression Discontinuity Designs
Regression discontinuity designs (RDD) are widely used for causal inference. In many empirical applications, treatment effects vary substantially with c...
Direct Preference Optimisation - RLHF Without the RL
DPO: how Rafailov et al. (2023) showed that RLHF has a closed-form solution - no reward model, no PPO, just supervised training on preference pairs.
Disassembly with dis - Reading CPython Bytecode
Master Python bytecode disassembly with the dis module at engineering depth - reading disassembly output, key opcodes explained, value stack evolution, comparing equivalent Python patterns at the instruction level, and practical performance insights.
Dissecting Quantization Error: A Concentration-Alignment Perspective
Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop. Recently, function-pres...
Distillation Datasets
Building distillation datasets: capturing frontier model knowledge, reasoning traces, and calibration into training data for smaller, efficient models - from Orca to Phi.
Distributed Training Strategies
Master data parallelism (DDP, FSDP), tensor parallelism, pipeline parallelism, 3D parallelism, gradient accumulation, all-reduce communication, and bandwidth requirements for training large models.
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes it...
Diverse Dictionary Learning
Given only observational data X = g(Z), where both the latent variables Z and the generating process g are unknown, recovering Z is ill-posed without ad...
DIVINE : Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment.
DIVINE : Coordinating Multimodal Disentangled Repres... - published at EACL 2026.
DMax: Aggressive Parallel Decoding for dLLMs
We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressi...
DNS, Service Discovery, and Consul
Master DNS and service discovery for distributed ML systems - DNS resolution chains, Kubernetes CoreDNS, Consul service mesh, etcd coordination, and how ML serving clusters register and find model endpoints dynamically.
Do AI Coding Agents Log Like Humans? An Empirical Study
Software logging is essential for maintaining and debugging complex systems, yet it remains unclear how AI coding agents handle this non-functional requ...
Do Audio-Visual Large Language Models Really See and Hear?
Audio-Visual Large Language Models (AVLLMs) are emerging as unified interfaces to multimodal perception. We present the first mechanistic interpretabili...
Do Sparse Autoencoders Capture Concept Manifolds?
Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption th...
Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
We benchmark how internal reasoning traces, which we call thought streams, affect video scene understanding in vision-language models. Using four config...
Docker and Containerized Local Inference
Running LLMs in Docker containers for reproducibility and deployment portability. NVIDIA Container Toolkit, Ollama and vLLM Docker images, multi-stage builds, and Docker Compose for a full local AI stack.
Docker Compose for ML Development
Build a complete local ML development environment with Docker Compose - training, serving, feature store, and monitoring all running with a single command.
Docker for ML
Learn Docker fundamentals from an ML perspective - why containers matter, how to write effective Dockerfiles, and how to manage ML model files in containers.
Document Chunking Strategies
Master the art and science of splitting documents into chunks that maximize retrieval precision - the most underestimated decision in RAG system design.
Document Ingestion and Chunking
Master every chunking strategy from fixed-size to semantic and structure-aware splitting. Learn how to parse PDFs, DOCX, and HTML, enrich metadata, evaluate chunk quality, and build a production-grade ingestion pipeline.
Document Review at Scale
e-Discovery, technology-assisted review (TAR), predictive coding, and building ML systems that process millions of documents for legal discovery in weeks instead of years.
Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs.
Does Generative AI speak Nigerian-Pidgin?: Issues ab... - published at NAACL 2025.
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems.
Does RAG Introduce Unfairness in LLMs? Evaluating Fa... - published at COLING 2025.
Does Synthetic Layered Design Data Benefit Layered Design Decomposition?
Recent advances in image generation have made it easy to produce high-quality images. However, these outputs are inherently flattened, entangling foregr...
Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution
Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that t...
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never s...
DPO and Modern Alignment Techniques
Direct Preference Optimization and its successors - how DPO eliminates the need for a separate reward model and RL training, plus IPO, KTO, SimPO, and ORPO.
DPO: Direct Preference Optimization
Master DPO - the elegant insight that you can optimize LLMs for human preferences without training a reward model or running RL, derived directly from the optimal RLHF policy.
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and pr...
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation
Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report genera...
Driving Chinese Spelling Correction from a Fine-Grained Perspective.
Driving Chinese Spelling Correction from a Fine-Grai... - published at COLING 2025.
Dropout and Regularization
Complete guide to dropout mechanics and inverted scaling, L1 vs L2 regularization and weight decay math, Monte Carlo Dropout for uncertainty, Batch Normalization as implicit regularizer, label smoothing cross-entropy derivation, DropConnect and DropPath variants, and a production-quality regularized training loop in PyTorch.
Drug Discovery with AI
How AI accelerates pharmaceutical research - AlphaFold protein structure prediction, graph neural networks for molecular property prediction, generative chemistry, and virtual screening for drug candidates.
DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation
Graph domain adaptation (GDA) aims to transfer knowledge from a labeled source graph to an unlabeled target graph under distribution shifts. However, ex...
Dual Debiasing for Noisy In-Context Learning for Text Generation.
Dual Debiasing for Noisy In-Context Learning for Tex... - published at ACL 2025.
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-st...
Dual-View Training for Instruction-Following Information Retrieval
Instruction-following information retrieval (IF-IR) studies retrieval systems that must not only find documents relevant to a query, but also obey expli...
Dunder Methods - Python's Protocol System at Engineering Depth
Master Python's dunder (double-underscore) method system - comparison protocols, arithmetic operators, container protocols, context managers, callable objects, and attribute access hooks. Learn how Python's syntax maps to method calls.
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benc...
DVC: Data Version Control
DVC in production - pointer files, remote storage, pipeline definitions (dvc.yaml), caching, dvc repro, CI/CD integration, and versioning 500GB datasets without bloating git.
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for...
Dynamic Class Creation - Building Classes at Runtime
Master the type() three-argument form, the full class creation pipeline, code generation with exec and compile, namedtuple internals, __prepare__, and building DSLs that generate Python classes at runtime.
Dynamic Pricing Models
Price elasticity estimation, competitor-aware pricing, markdown optimization for seasonal goods, causal inference for pricing decisions, and the ML systems behind Amazon's real-time repricing engine.
Dynamic Programming for ML
Dynamic programming patterns in ML - edit distance for NLP evaluation, Viterbi decoding for sequence labeling, CTC for speech recognition, dynamic time warping, beam search, Bellman equations in reinforcement learning, and DP in autoregressive generation.
Dynamic Programming for RL
Policy evaluation, policy iteration, and value iteration - solving MDPs exactly when you know the environment model. Master the theoretical foundation that all model-free RL approximates.
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure
Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint e...
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models.
EasyDistill: A Comprehensive Toolkit for Effective K... - published at EMNLP 2025.
EasyVideoR1: Easier RL for Video Understanding
Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large languag...
EB-RANSAC: Random Sample Consensus based on Energy-Based Model
Random sample consensus (RANSAC), which is based on a repetitive sampling from a given dataset, is one of the most popular robust estimation methods. In...
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision-...
Edge AI in Manufacturing
Learn how to deploy AI models on industrial edge hardware using TensorRT quantization, ONNX Runtime, OpenVINO, MQTT-based edge-cloud architectures, and fleet management for hundreds of edge devices.
Edge and Mobile Inference
Running neural networks on devices with 5-15W power budgets - mobile NPUs, Apple Neural Engine, Qualcomm Hexagon, deployment frameworks, and LLMs on-device with llama.cpp and MLX.
Edge ML Deployment
Deploying ML models to smartphones, IoT devices, and embedded systems - model compression, edge runtimes, OTA updates, federated learning, and real-world examples.
EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model
We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models...
EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
Multimodal Large Language Models (MLLMs) hold significant promise for revolutionizing traditional education and reducing teachers' workload. However, ac...
Effective sample size approximations as entropy measures
In this work, we analyze alternative effective sample size (ESS) metrics for importance sampling algorithms, and discuss a possible extended range of ap...
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers.
Efficiency-Effectiveness Reranking FLOPs for LLM-bas... - published at EMNLP 2025.
Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification
Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, hi...
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
Multivector retrieval models achieve state-of-the-art effectiveness through fine-grained token-level representations, but their deployment incurs substa...
Efficient Refusal Ablation in LLM through Optimal Transport
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent activation-ba...
Efficient RL Training for LLMs with Experience Replay
While Experience Replay - the practice of storing rollouts and reusing them multiple times during training - is a foundational technique in general RL,...
Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems
In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In...
Efficient Training on Multiple Consumer GPUs with RoundPipe
Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconne...
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach
While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data....
ELT: Elastic Looped Transformers for Visual Generation
We introduce Elastic Looped Transformers (ELT), a highly parameter-efficient class of visual generative models based on a recurrent transformer architec...
Embedding Models - The Landscape
A comprehensive survey of the embedding model ecosystem - SBERT, contrastive learning, SimCSE, E5, BGE, GTE, OpenAI, Voyage AI, Cohere, and the MTEB leaderboard.
Embedding Models Deep Dive
Master embedding model selection for retrieval - MTEB benchmarks, model families, Matryoshka embeddings, bi-encoders vs cross-encoders, and fine-tuning strategies.
Embedding Models in Production
How to choose, deploy, and manage embedding models at scale - including versioning, caching, batching, and migration strategies for production RAG systems.
Embedding Quantization
Reducing embedding storage and search costs - float32 to float16, int8, and binary quantization, Hamming distance search, the rescoring trick, and implementation with FAISS and Qdrant.
Embedding Spaces
How token embeddings form a dense vector space that captures semantic meaning - geometry, anisotropy, weight tying, and visualization.
Embedding Stores
Storing and serving dense embeddings at scale for real-time recommendation and search.
Embeddings in Production
Build, deploy, and operate production-grade embedding pipelines - caching, incremental indexing, staleness management, vector DB selection, and cost optimization at scale.
Emergent Compositional Communication for Latent World Properties
Can multi-agent communication pressure extract discrete, compositional representations of invisible physical properties from frozen video features? We s...
EMO: Pretraining Mixture of Experts for Emergent Modularity
Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabil...
Empathy Prediction from Diverse Perspectives.
Empathy Prediction from Diverse Perspectives. - published at ACL 2025.
Encapsulation and Data Hiding - Properties, Name Mangling, and Descriptors
Master Python's encapsulation model - single vs double underscore conventions, name mangling mechanics, @property for controlled access, validation in setters, __slots__, and the descriptor protocol that powers @property, @classmethod, and @staticmethod internally.
Encoder vs Decoder vs Encoder-Decoder
Comparing encoder-only, decoder-only, and encoder-decoder transformer architectures - when to use each and why decoder-only won.
Encoder-Free Human Motion Understanding via Structured Motion Descriptions
The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion...
Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential Privacy
Federated learning (FL) is a distributed machine learning method where multiple devices collaboratively train a model under the management of a central...
Enhancing AI and Dynamical Subseasonal Forecasts with Probabilistic Bias Correction
Decision-makers rely on weather forecasts to plant crops, manage wildfires, allocate water and energy, and prepare for weather extremes. Today, such for...
Enhancing Authorship Attribution with Synthetic Paintings
Attributing authorship to paintings is a historically complex task, and one of its main challenges is the limited availability of real artworks for trai...
Enhancing Hyperspace Analogue to Language (HAL) Representations via Attention-Based Pooling for Text Classification
The Hyperspace Analogue to Language (HAL) model relies on global word co-occurrence matrices to construct distributional semantic representations. While...
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub.
Enhancing Open-Domain Task-Solving Capability of LLM... - published at ACL 2025.
Enhancing Reliability in Community Question Answering with an Expert-Oriented RAG System.
Enhancing Reliability in Community Question Answerin... - published at EACL 2026.
Enhancing Robustness of Federated Learning via Server Learning
This paper explores the use of server learning for enhancing the robustness of federated learning against malicious attacks even when clients' training...
Environment Parity
Solve the dev/staging/prod parity problem for ML - feature skew, infrastructure differences, data drift, and environment promotion pipelines that prevent production surprises.
Envisioning the Future, One Step at a Time
Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains,...
Episodic Memory with Vector Store
Implement agent episodic memory using vector databases: storing, retrieving, consolidating, and forgetting past experiences at scale.
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers
As SE(3)-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consi...
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
The rapid advancement of Multimodal Large Language Models (MLLMs) has empowered Unmanned Aerial Vehicle (UAV) with exceptional capabilities in spatial r...
Escalation and Handoff Patterns
Designing AI systems that know when to stop and hand off to humans - confidence thresholds, sentiment detection, topic-based routing, context transfer, and escalation orchestration.
Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks
We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. I...
Ethics and AI in Education
Learn FERPA compliance, algorithmic bias in educational AI, surveillance concerns, data minimization, transparency requirements, and responsible deployment of AI in learning environments.
EU AI Act and Global AI Regulation
The EU AI Act, US executive orders, UK AI policy, China AI regulations, and practical compliance implications for AI engineers building and deploying language models.
Evaluating Embedding Models
MTEB benchmark deep dive, nDCG@10, Recall@K, MRR, MAP, building domain-specific evaluation sets, running MTEB locally, and avoiding the contamination problem.
Evaluating Fine-Tuned Models
Evaluation strategies for fine-tuned LLMs - held-out test sets, LLM-as-judge evaluation, perplexity measurement, task-specific benchmarks, and avoiding evaluation pitfalls.
Evaluating Generative Models - FID, IS, Precision/Recall, Human Evaluation
A complete guide to evaluating generative models - from the mathematics of FID and Inception Score to Precision/Recall manifolds, CLIP-based metrics, DINO similarity, human preference studies, metric gaming, and building production evaluation pipelines.
Evaluating Reasoning Models
The benchmark landscape for reasoning models - AIME, MATH-500, Codeforces, ARC-AGI, GPQA Diamond, process vs. outcome evaluation, and contamination concerns.
Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design
Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse s...
Evaluating the Quality of ML Explanations - Faithfulness, Robustness, and Human Studies
How to measure whether an ML explanation is actually good - faithfulness metrics, the ROAR benchmark, sanity checks, human evaluation studies, and a complete quantitative evaluation pipeline.
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resourc...
Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason's Selection Task.
Evaluation of Deontic Conditional Reasoning in Large... - published at EACL 2026.
Evaluation-Driven Development
Building AI systems test-first - write evals before writing prompts. The EDD loop, eval strategies, golden dataset construction, LLM-as-judge calibration, and a full EvalSuite implementation ready for CI integration.
Evaluation-driven Scaling for Scientific Discovery
Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively re...
Event Sourcing for ML Systems
Learn how event sourcing enables auditable, reproducible ML systems - covering the event log, Kafka as an event store, temporal queries, and the projection pattern.
Event-Driven Architecture for ML
Event sourcing and CQRS patterns for ML systems - event-driven state management, Kafka Streams for ML pipelines, event schema design, dead letter queues, and event replay for debugging.
Event-Driven ML Architecture
Designing ML systems around events - event sourcing, CQRS for feature stores, the outbox pattern, and how LinkedIn's unified messaging platform drives ML at scale.
Event-Driven Temporal Graph Networks for Asynchronous Multi-Agent Cyber Defense in NetForge_RL
The transition of Multi-Agent Reinforcement Learning (MARL) policies from simulated cyber wargames to operational Security Operations Centers (SOCs) is...
Evol-Instruct
Evol-Instruct: systematically evolving instruction datasets to create complex, diverse training data that produces stronger instruction-following models - the technique behind WizardLM and WizardCoder.
EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale
The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inh...
EXAONE 4.5 Technical Report
This technical report introduces EXAONE 4.5, the first open-weight vision language model released by LG AI Research. EXAONE 4.5 is architected by integr...
Experience Transfer for Multimodal LLM Agents in Minecraft Game
Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we pro...
Experiment Tracking
Design and govern ML experiment tracking at scale - from MLflow architecture to organizing 50 data scientists' experiments without chaos.
Experimentation and A/B Testing for ML Systems
How to design statistically rigorous experiments for ML systems - Bayesian vs frequentist A/B tests, network interference, interleaving, switchback experiments, and guardrail metrics.
Experimentation Platforms
Build and operate ML experimentation infrastructure - assignment services, metric computation pipelines, analysis tools, and the engineering required to scale from 3 to 30 experiments per month.
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters fro...
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-cho...
Explainability in Production
Serving model explanations alongside predictions - SHAP for production, Anchors for rule-based explanations, explanation as a service, debugging production failures with explanations, and regulatory compliance.
Explainability in Production ML Systems - Monitoring, Latency, and Compliance
How to operationalize ML explainability at scale - latency budgets, caching strategies, drift monitoring, compliance audit trails, and production architecture patterns for regulated industries.
Explainable cluster analysis: a bagging approach
A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of si...
Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI
Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods ofte...
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time Series Foundation Models (TSFMs) have recently emerged as general-purpose forecasting models and show considerable potential for applications in en...
Exploiting Subgradient Sparsity in Max-Plus Neural Networks
Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In...
Exploration and Exploitation Errors Are Measurable for Language Model Agents
Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these...
Exploration Hacking: Can LLMs Learn to Resist RL Training?
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment....
Exploring Spatial Intelligence from a Generative Perspective
Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective....
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
Few-step generation has been a long-standing goal, with recent one-step generation methods exemplified by MeanFlow achieving remarkable results. Existin...
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that...
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models.
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dy... - published at ACL 2025.
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read...
Factuality and Hallucination Evaluation
Measuring hallucination rates in open-source LLMs - TruthfulQA, FActScore, RAGAs factuality, entity verification, and building automated hallucination detection pipelines for production RAG systems.
Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables
Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender...
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization
Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchma...
FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization
Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot suppor...
Fast Byte Latent Transformer
Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limite...
Fast Spatial Memory with Elastic Test-Time Training
Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remai...
FastAPI - Type-Driven APIs with Automatic Validation and Docs
Master FastAPI at engineering depth - ASGI foundations, Pydantic validation, dependency injection, middleware, response models, background tasks, testing, and router organisation for production APIs.
Fault Tolerance in Large Cluster Training
Why fault tolerance is critical at scale, how to design checkpointing strategies, detect stragglers, handle spot preemptions, and recover from failures without restarting multi-week training runs.
FaultXformer: A Transformer-Encoder Based Fault Classification and Location Identification model in PMU-Integrated Active Electrical Distribution System
Accurate fault detection and localization in electrical distribution systems is crucial, especially with the increasing integration of distributed energ...
Feature Consistency
Ensuring identical features between training (offline) and serving (online).
Feature Engineering at Scale
How to redesign feature engineering pipelines for distributed compute when a 10 GB solution fails at 500 GB.
Feature Engineering at Scale - The 80% of ML Work That Determines 80% of Results
How to build feature pipelines that work identically in training and serving - feature stores, point-in-time joins, crossing, embedding lookup, and avoiding training-serving skew.
Feature Importance and SHAP
Master all three feature importance types, TreeSHAP for exact Shapley values, SHAP interaction values, feature selection with SHAP, data leakage detection, fairness analysis, and production importance drift monitoring.
Feature Importance Methods - Beyond SHAP
Permutation importance, impurity-based importance, partial dependence plots, ALE, H-statistics, Sobol indices, and production monitoring - the complete toolkit for understanding which features drive your model's decisions, and when each method lies to you.
Feature Monitoring
Detecting feature drift, staleness, and coverage gaps in production.
Feature Monitoring in Production
Monitoring features after deployment - PSI, KS tests, freshness monitoring, completeness tracking, and proving to a regulator that no feature drifted more than 10% PSI.
Feature Platform
Build a shared feature platform that eliminates cross-team feature duplication, ensures training-serving consistency, and serves fresh features at millisecond latency.
Feature Selection and Importance
Reducing 500 features to 50 without losing model performance - filter, wrapper, and embedded methods, SHAP-based selection, and leakage detection.
Feature Store Architecture
How feature stores solve training-serving skew with a dual-store architecture - offline store for training, online store for serving, and point-in-time correct retrieval.
Feature Stores in Production
Architecture and operations of feature stores - offline and online layers, point-in-time joins, and avoiding the training-serving skew that costs you accuracy.
Feature Validation and Testing
Ensuring feature quality through schema validation, unit tests, integration tests, and monitoring - catching the NaN bug before it degrades your model for 3 weeks.
Federated Learning in Healthcare
Training ML models across hospital systems without sharing patient data - FedAvg algorithm, differential privacy, non-IID data challenges, NVIDIA FLARE, and practical multi-hospital federated learning with Flower.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and i...
Feed-Forward Layers
The role of position-wise feed-forward networks in transformers - from the basic FFN to SwiGLU and Mixture of Experts.
Feedback Collection for LLM Systems
Build production-grade feedback collection systems for AI products - explicit signals, implicit behavioral signals, data schemas, bias mitigation, and closed-loop improvement pipelines.
Feedback Loops and Data Flywheels
How recommendation systems create self-reinforcing feedback loops, how to detect them, and how inverse propensity weighting and exploration strategies break them to enable unbiased learning.
Feedback Loops and the Data Flywheel - How ML Systems Compound Over Time
A deep dive into feedback loop design, concept drift detection, retraining strategies, and building data flywheels that make ML systems continuously improve in production.
Few-Shot Learning and Chain-of-Thought Prompting
Master few-shot example selection, chain-of-thought reasoning, self-consistency decoding, and when to use each technique for reliable LLM outputs.
Few-Shot Prompting
Master in-context learning by providing carefully selected examples that demonstrate the exact behavior you want - without any model fine-tuning.
File Systems and IO Patterns
Master Linux file systems for ML workloads - VFS, ext4/XFS, page cache, direct I/O, mmap, io_uring, and how to tune I/O for maximum training data throughput and checkpoint speed.
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization r...
Fine-Tuning Cost and ROI Analysis
Making the business case for LLM fine-tuning - calculating GPU compute costs, estimating break-even against API pricing, and deciding when fine-tuning beats prompt engineering on ROI.
Fine-Tuning Diffusion Models - DreamBooth, LoRA, Textual Inversion, ControlNet
How to teach Stable Diffusion new concepts with as few as 5-20 images - covering Textual Inversion, DreamBooth, LoRA, ControlNet, and IP-Adapter with full training code, hyperparameter guidance, and evaluation strategies.
Fine-Tuning Embedding Models for Your Domain
Contrastive fine-tuning with triplet loss, hard negative mining, in-batch negatives, synthetic data generation, TSDAE, GPL, and a full worked example on domain adaptation.
Fine-Tuning Hyperparameter Search
Systematic hyperparameter optimization for LLM fine-tuning - learning rate, batch size, epochs, LoRA rank, warmup schedules, and efficient search strategies with Optuna and WandB sweeps.
Fine-Tuning Ops
Operationalize LLM fine-tuning at scale - data pipelines, LoRA adapter management, adapter registries, and serving 50 customer-specific adapters efficiently.
Fine-Tuning Pipelines
End-to-end fine-tuning pipeline engineering - from data collection and curation to training, evaluation, and deployment. When to fine-tune vs RAG vs prompt engineering, and how to build the pipeline that makes it repeatable and production-safe.
Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations....
Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
Reinforcement learning (RL) has become a standard technique for post-training diffusion-based image synthesis models, as it enables learning from reward...
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
Given a person and a garment image, virtual try-on (VTO) aims to synthesize a realistic image of the person wearing the garment, while preserving their...
Five Pillars of Data Observability for ML Systems
What freshness, distribution, volume, schema, and lineage tracking do for AI systems, when silent data drift and pipeline failures silently corrupt model inputs and degrade predictions, and how to instrument these five pillars in production AI data pipelines.
Fixed-Budget Constrained Best Arm Identification in Grouped Bandits
We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic...
FL-MHSM: Spatially-adaptive Fusion and Ensemble Learning for Flood-Landslide Multi-Hazard Susceptibility Mapping at Regional Scale
Existing multi-hazard susceptibility mapping (MHSM) studies often rely on spatially uniform models, treat hazards independently, and provide limited rep...
Flash Attention Kernel Deep Dive
How FlashAttention rewrites the attention mechanism to never materialize the N x N matrix in HBM, the online softmax tiling algorithm, IO complexity analysis, and FlashAttention 2 and 3 improvements.
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computation...
FlashOptim: Optimizers for Memory Efficient Training
Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just th...
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption
Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retr...
Flask - Building REST APIs the Right Way
Master Flask at engineering depth - application factory pattern, request context proxies, routing, Blueprints, error handlers, testing with test_client, configuration management, and the extension ecosystem for building production-grade REST APIs.
FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems
We present FlexiTac, a low-cost, open-source, and scalable piezoresistive tactile sensing solution designed for robotic end-effectors. FlexiTac is a pra...
Flow Matching is Adaptive to Manifold Structures
Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-depend...
Flow-OPD: On-Policy Distillation for Flow Matching Models
Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-...
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing
We propose FlowAnchor, a training-free framework for stable and efficient inversion-free, flow-based video editing. Inversion-free editing methods have...
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching
Multimodal generation has long been dominated by text-driven pipelines where language dictates vision but cannot reason or create within it. We challeng...
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
The quadratic computational complexity of standard attention mechanisms presents a severe scalability bottleneck for LLMs in long-context scenarios. Whi...
Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models
Autoregressive (AR) video diffusion models adopt a streaming generation framework, enabling long-horizon video generation with real-time responsiveness,...
FoReco and FoRecoML: A Unified Toolbox for Forecast Reconciliation in R
Forecast reconciliation has become key to improving the accuracy and coherence of forecasts for linearly constrained multiple time series, such as hiera...
Forge-UGC: FX optimization and register-graph engine for universal graph compiler
We present Forge-UGC (FX Optimization and Register-Graph Engine for Universal Graph Compilation), a four-phase compiler for transformer deployment on he...
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios
The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution,...
Foundational CS for ML Engineers
The computer science foundations that make ML engineers dangerous - CPU and GPU architecture, operating systems, compilers, memory management, networking, algorithms, and systems programming.
Four Types of Agent Memory
Cognitive science meets AI engineering: working, episodic, semantic, and procedural memory implemented in production agent systems.
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
Reinforcement-Learning-based post-training has recently emerged as a promising paradigm for aligning text-to-image diffusion models with human preferenc...
FPGAs for AI Inference
How FPGAs enable sub-microsecond AI inference - reconfigurable logic, HLS programming, Xilinx Vitis AI, quantization strategies, and when FPGAs beat GPUs for latency-critical deployments.
Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems
What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitio...
Frame Theoretical Derivation of Three Factor Learning Rule for Oja's Subspace Rule
We show that the error-gated Hebbian rule for PCA (EGHR-PCA), a three-factor learning rule equivalent to Oja's subspace rule under Gaussian inputs, can...
Framework Comparison
Comprehensive comparison of LangGraph, CrewAI, AutoGen, LlamaIndex, and raw API across 12 production dimensions - with decision flowchart and real case studies.
Framing ML Problems - Turning Business Goals into Training Objectives
Learn how to translate ambiguous business goals into precise ML objectives - the most critical and most overlooked skill in ML system design.
Frankenmodels and Limitations of Model Merging
Layer grafting, depth upscaling, Solar 10.7B, and the fundamental limits of what model merging can and cannot achieve.
Fraud Detection Systems
Real-time payment fraud detection at Stripe scale - rule-based baselines, graph fraud detection, session-level features, adversarial robustness, and false positive cost analysis.
Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself
Feed-forward 3D reconstruction models are efficient but rigid: once trained, they perform inference in a zero-shot manner and cannot adapt to the test s...
From Context to Skills: Can Language Models Learn from Context Skillfully?
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learn...
From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes.
From Feedback to Checklists: Grounded Evaluation of... - published at EMNLP 2025.
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding.
From Long Videos to Engaging Clips: A Human-Inspired... - published at EMNLP 2025.
From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are u...
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its...
From Paper to Structured JSON: An Agentic AI Workflow for Compliant BMR Digital Transformation.
From Paper to Structured JSON: An Agentic AI Workflo... - published at EACL 2026.
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models
Reinforcement learning (RL) for large language models (LLMs) increasingly relies on sparse, outcome-level rewards -- yet determining which actions withi...
From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on s...
From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms
Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. Whi...
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
Many real-world coding challenges are open-ended and admit no known optimal solution. Yet, recent progress in LLM coding has focused on well-defined tas...
Full Fine-Tuning vs PEFT
Decision framework for choosing between full fine-tuning and parameter-efficient methods like LoRA and QLoRA - covering compute requirements, quality ceilings, catastrophic forgetting, and when each approach wins.
Full Fine-Tuning vs PEFT: Decision Framework
A practical decision framework for choosing between full fine-tuning, LoRA, QLoRA, prompt tuning, and other PEFT methods based on your model size, data, and quality requirements.
GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification.
GADFA: Generator-Assisted Decision-Focused Approach... - published at COLING 2025.
GAIA Benchmark
GAIA tests general-purpose agents on real-world tasks requiring web search, file reading, code execution, and multi-step reasoning. Learn the task structure, scoring, SOTA analysis, and how to build GAIA-style evaluations.
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Towards an embodied generalist for real-world interaction, Multimodal Large Language Model (MLLM) agents still suffer from challenging latency, sparse f...
Garbage Collection - Generational GC, Cycle Detection, and Memory Leak Diagnosis
Master CPython's cyclic garbage collector at engineering depth - generational collection, three generations, cycle detection algorithm, gc module API, __del__ and PEP 442, gc.freeze() for fork, gc.get_referrers() for leak diagnosis, and common memory leak patterns.
Garbage Collection Algorithms
How Python's reference counting and generational garbage collector work, why GC pauses hurt ML serving latency, and how to tune or disable GC for performance-critical workloads.
Gaussian Processes - Non-Parametric Bayesian Regression with Calibrated Uncertainty
Gaussian processes provide a full distribution over functions with principled uncertainty estimates - how they work, kernel engineering, and when to use them over neural networks.
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
The autonomous discovery of bugs remains a significant challenge in modern software development. Compared to code generation, the complexity of dynamic...
General Bayesian Policy Learning
This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an a...
General Multimodal Protein Design Enables DNA-Encoding of Chemistry
Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep gene...
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and...
Generalization and Scaling Laws for Mixture-of-Experts Transformers
We develop a theory of generalization and scaling for Mixture-of-Experts (MoE) Transformers that cleanly separates \emph{active} per-input capacity from...
Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data
Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often pr...
Generalized Linear Models
Understand the GLM framework - link functions, exponential family distributions, Poisson regression for count data, Gamma regression for positive continuous targets, IRLS algorithm, overdispersion, and deviance-based model comparison.
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
Reinforcement learning (RL) has become a central post-training tool for improving the reasoning abilities of large language models (LLMs). In these syst...
Generating DDPM-based Samples from Tilted Distributions
Given $n$ independent samples from a $d$-dimensional probability distribution, our aim is to generate diffusion-based samples from a distribution obtain...
Generating Multi-Aspect Queries for Conversational Search.
Generating Multi-Aspect Queries for Conversational S... - published at EACL 2026.
Generating Statistical Charts with Validation-Driven LLM Workflows
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are...
Generative Adversarial Networks - From the Original GAN to StyleGAN
The complete story of GANs - from Goodfellow's 2014 minimax formulation to DCGAN, Wasserstein GAN, Progressive GAN, and StyleGAN2 - including training instabilities, theoretical foundations, and why diffusion models eventually surpassed them.
Generative Modeling with Orbit-Space Particle Flow Matching
We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP i...
Generative Models Overview - VAEs, GANs, Flow Models, and Diffusion
A unified view of generative modeling approaches - how VAEs, GANs, normalizing flows, energy-based models, and diffusion models each define a different way to learn a distribution, with trade-offs in quality, diversity, training stability, and likelihood.
Generative Quantum-inspired Kolmogorov-Arnold Eigensolver
High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum cir...
Generative Refinement Networks for Visual Synthesis
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless...
Generators and yield - Suspended Execution at Engineering Depth
Understand Python generators and yield at engineering depth - frame suspension, the generator state machine, send() and the coroutine protocol, yield from, throw() and close(), memory-efficient pipelines, and the foundation of async/await.
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
Long-horizon large language model (LLM) agents are fundamentally limited by context. As interactions become longer, tool descriptions, retrieved memorie...
Generics and TypeVar
Master generic programming in Python with TypeVar, Generic base class, bound and constrained type variables, covariance vs contravariance vs invariance, and real-world patterns from FastAPI and SQLAlchemy.
GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos
We present GenLCA, a diffusion-based generative model for generating and editing photorealistic full-body avatars from text and image inputs. The genera...
Genomics and Protein Folding
AI for genomics and protein science - AlphaFold 2 architecture, variant calling, polygenic risk scores, DNA language models, and practical protein structure prediction with ESMFold.
GeoChemAD: Benchmarking Unsupervised Geochemical Anomaly Detection for Mineral Exploration
Geochemical anomaly detection plays a critical role in mineral exploration as deviations from regional geochemical baselines may indicate mineralization...
Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress
Genome engineering has achieved remarkable sequence-level precision, yet predicting the transcriptomic state that a cell will occupy after perturbation...
Geometric Context Transformer for Streaming 3D Reconstruction
Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric acc...
Geometric regularization of autoencoders via observed stochastic dynamics
Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional am...
GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typica...
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification
Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient...
GitHub Actions for ML
Build a complete ML CI pipeline in GitHub Actions that triggers training only when training data or model code changes - not on every commit.
GitLab CI for ML
Build an enterprise-grade ML CI/CD pipeline in GitLab CI - from data commit to production deployment with DAG pipelines, GPU runners, and environments.
GitOps for ML
Apply GitOps principles to ML infrastructure - Flux CD, ArgoCD, image update automation, secrets management, and PR-gated model deployments with Argo Rollouts.
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environmen...
Global Interpretability via Automated Preprocessing: A Framework Inspired by Psychiatric Questionnaires
Psychiatric questionnaires are highly context sensitive and often only weakly predict subsequent symptom severity, which makes the prognostic relationsh...
Global Optimality for Constrained Exploration via Penalty Regularization
Efficient exploration is a central problem in reinforcement learning and is often formalized as maximizing the entropy of the state-action occupancy mea...
Global, Shared, and Register Memory
Master the five CUDA memory spaces - registers, shared memory, L1/L2 cache, and global memory - with real latency numbers, tiled matrix multiply, and the patterns that separate 8% bandwidth utilization from 85%.
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representa...
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
Optical character recognition (OCR) has advanced rapidly with the rise of vision-language models, yet evaluation has remained concentrated on a small cl...
GNNs for Recommender Systems
How LightGCN, PinSage, and NGCF use graph neural networks on user-item interaction graphs to capture multi-hop collaborative filtering signals at billion-scale.
GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression
Current network data telemetry pipelines consist of massive streams of fine-grained Key Performance Indicators (KPIs) from multiple distributed sources...
Goal-Driven Data Story, Narrations and Explanations.
Goal-Driven Data Story, Narrations and Explanations. - published at NAACL 2025.
Google BigQuery
BigQuery architecture, ML built-in functions, and BigQuery ML.
Google TPU Architecture
Deep dive into Google's Tensor Processing Units - systolic array design, XLA compilation, TPU pod topology, and how to write high-performance JAX programs that avoid recompilation traps.
Google Vertex AI for MLOps
Master the complete Google Vertex AI platform for end-to-end ML workflows - Pipelines, Training, Prediction, Feature Store, Model Registry, Experiments, and production deployment on GCP.
GPTQ In Depth
A deep technical walkthrough of the GPTQ algorithm - Optimal Brain Surgeon derivation, layer-by-layer quantization, group quantization, actorder, and practical deployment with AutoGPTQ and vLLM.
GPTQ: Post-Training Quantization
GPTQ explained from first principles - how Hessian-based error compensation quantizes 175B models to 4-bit in hours, the role of calibration data, group size, activation reordering, and how to deploy GPTQ models in production with vLLM and autoGPTQ.
GPU Architecture for ML Engineers
Understand CUDA cores vs Tensor Cores, GPU memory hierarchy, FLOPS vs memory bandwidth, the roofline model, warp execution, and NVLink - the hardware knowledge that drives ML optimization.
GPU Cluster Networking
InfiniBand vs RoCE vs Ethernet for GPU cluster communication, fat-tree and rail-optimized topologies, GPUDirect RDMA, SHARP in-network aggregation, and diagnosing collective communication bottlenecks in production ML clusters.
GPU Containers
Build and run GPU-enabled containers for ML - covering NVIDIA Container Toolkit, CUDA compatibility, Kubernetes GPU scheduling, and debugging GPU access.
GPU Cost Optimization
Systematically reduce GPU infrastructure costs with spot instances, GPU sharing via MPS and MIG, right-sizing, reserved instances, efficient batching, utilization monitoring, and GPU marketplace strategies.
GPU Inference vs Training Requirements
Why inference and training have fundamentally different GPU hardware requirements, covering compute vs memory-bandwidth bottlenecks, the prefill/decode split, and how to select the right GPU for serving.
GPU Memory Hierarchy Deep Dive
Complete GPU memory hierarchy - registers, L1/shared memory, L2 cache, and HBM - capacity, bandwidth, latency at each level, and how data flows through the hierarchy during kernel execution.
GPU Memory Management
Master VRAM capacity planning, activation checkpointing, mixed precision training, ZeRO optimizer stages, CPU offloading, and OOM debugging for production ML workloads.
GPU Scheduling in Kubernetes
GPU resource management in Kubernetes - NVIDIA device plugin, MIG, time-slicing, node affinity, GPU quotas per namespace, and DCGM monitoring for ML clusters.
GPU vs CPU Architecture
Why GPUs dominate deep learning - SIMT execution model, throughput vs latency optimization, the fundamental design tradeoffs between CPU and GPU silicon.
Gradient Boosting From Scratch
Understand gradient boosting from first principles - additive models, functional gradient descent, pseudo-residuals for any loss function, shrinkage, stochastic boosting, and bias-variance tradeoffs versus Random Forest.
Gradient Boosting within a Single Attention Layer
Transformer attention computes a single softmax-weighted average over values -- a one-pass estimate that cannot correct its own errors. We introduce \em...
Gradient Checkpointing and Rematerialization
Activation checkpointing to reduce training memory usage, sublinear memory algorithm, selective checkpointing strategies, and implementation in PyTorch and JAX.
Gradient Descent From Scratch
Implement gradient descent for linear regression from first principles - derive the gradient, analyze the loss landscape, understand learning rate via Lipschitz constants, implement momentum, gradient clipping, and convergence analysis via condition number.
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In thi...
Gradient Regularized Newton Boosting Trees with Global Convergence
Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based...
GRAM: Generative Recommendation via Semantic-aware Multi-granular Late Fusion.
GRAM: Generative Recommendation via Semantic-aware M... - published at ACL 2025.
Graph Algorithms and GNNs
Master graph representations, classical graph algorithms, and graph neural networks - from BFS/DFS and PageRank to GCN, GraphSAGE, and GAT with PyTorch Geometric.
Graph Attention Networks
GAT - learning which neighbors matter via attention over graph edges. Multi-head attention, GATv2's dynamic attention, heterophilic graphs, and training on Cora with PyTorch Geometric.
Graph Convolutional Networks
GCN derivation from spectral graph theory to efficient spatial message passing. Symmetric normalization, renormalization trick, over-smoothing, and training on Cora with PyG.
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world s...
Graph RAG
Master Microsoft's Graph RAG - build knowledge graphs from documents, use community detection for global queries, and understand when graph structure beats flat vector search.
Graph Representation for ML
Node embeddings from shallow methods to GNNs - DeepWalk, Node2Vec, LINE, spectral embeddings, manual features, and their fundamental limitations. How to featurize nodes, edges, and graphs.
Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs
Extending CoT through RL has been widely used to enhance the reasoning capabilities of LLMs. However, due to the sparsity of reward signals, it can also...
Graph-Informed Adversarial Modeling: Infimal Subadditivity of Interpolative Divergences
We study adversarial learning when the target distribution factorizes according to a known Bayesian network. For interpolative divergences, including $(...
GraphSAGE and Inductive Learning
GraphSAGE - sample and aggregate for inductive GNNs that generalize to unseen nodes. Neighbor sampling, mini-batch training, unsupervised learning, and PinSage for billion-scale recommendations.
Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models
Vision-Language Models (VLMs) excel on many multimodal reasoning benchmarks, but these evaluations often do not require an exhaustive readout of the ima...
Groq LPU Architecture
How Groq's Language Processing Unit eliminates the memory bottleneck for LLM inference by keeping model weights in on-chip SRAM and using deterministic compiler-scheduled execution.
gRPC and Protocol Buffers
Learn gRPC and Protocol Buffers for high-performance ML inference APIs - from protobuf wire format to bidirectional streaming, interceptors, health checks, and production deployment patterns.
GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows
The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows....
Guardrails and Safety Systems
Build layered defense-in-depth safety systems for LLM applications - input filtering, toxicity detection, PII redaction, prompt injection defense, output validation, and human review escalation.
GUI Automation with Vision
Vision-based GUI automation for desktop applications - coordinate grounding, UI element detection, OCR integration, state tracking, and building a desktop automation agent.
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables.
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasonin... - published at NAACL 2025.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
Training embodied AI agents depends critically on the visual fidelity of simulation environments and the ability to model dynamic humans. Current simula...
Hallucination Risk in Legal AI
Why LLM hallucination is malpractice in legal contexts, grounding strategies, citation verification pipelines, and architecture patterns for trustworthy legal AI.
Hallucinations Undermine Trust; Metacognition is a Way Forward
Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLM...
Handling LLM Latency
Perceived latency, progressive rendering, streaming, prompt caching, and UX patterns for making slow AI responses feel fast.
Hardware Acceleration Beyond GPU
FPGA, ASIC, TPU systolic arrays, neuromorphic chips, photonic computing, and processing-in-memory for ML - when to use each, economic analysis, and the emerging hardware landscape beyond NVIDIA GPUs.
Hardware and Silicon for AI
GPU architecture, CUDA programming, custom silicon, kernel optimization, memory systems, and distributed training hardware - the layer below the framework that determines what is actually possible.
Hardware Performance Counters
Master hardware performance counters, the PMU, and Linux perf to diagnose CPU bottlenecks, optimize cache behavior, and profile ML workloads with surgical precision.
Hardware Requirements and Selection
How to select hardware for running LLMs locally - VRAM and RAM requirements by model size, GPU tier comparison, Apple Silicon analysis, CPU-only inference feasibility, and a practical hardware selection matrix.
Hash Tables and Bloom Filters
Deep dive into hash table internals, consistent hashing for distributed ML, Bloom filters for training data deduplication, MinHash LSH for near-duplicate detection, and fingerprinting for dataset versioning.
HBM and GDDR Memory Technologies
High Bandwidth Memory vs GDDR6X - how 3D stacking with Through-Silicon Vias enables HBM3 to deliver 3.35 TB/s on H100, why GDDR6X tops at 1 TB/s, the economics of each, and how memory bandwidth constrains LLM inference throughput.
HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems
Agentic AI systems increasingly execute consequential actions on behalf of human principals, delegating tasks through multi-step chains of autonomous ag...
HDR Video Generation via Latent Alignment with Logarithmic Encoding
High dynamic range (HDR) imagery offers a rich and faithful representation of scene radiance, but remains challenging for generative models due to its m...
Healthcare AI GYM for Medical Agents
Clinical reasoning demands multi-step interactions -- gathering patient history, ordering tests, interpreting results, and making safe treatment decisio...
Heap and Stack Memory
Learn how stack frames, heap allocation, and Python's memory model work under the hood - from C struct padding to pymalloc arenas, with production debugging techniques.
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyse...
HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarka...
Helm for ML Deployments
Helm charts for ML applications - chart anatomy, parameterizing ML deployments, environment values files, lifecycle hooks for model validation, and umbrella charts for multi-component stacks.
Heterogeneous Scientific Foundation Model Collaboration
Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally...
Hexagonal Architecture (Ports and Adapters)
Implement Hexagonal Architecture in Python using Protocol-based ports, swappable adapters, and clear boundaries between application logic and external systems.
Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical in...
Hierarchical Clustering
Agglomerative and divisive hierarchical clustering - linkage criteria, dendrograms, cophenetic correlation, and production-scale strategies for discovering multi-scale data structure.
Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations
Hierarchical time-series forecasting is essential for demand prediction across various industries. While machine learning models have obtained significa...
Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs
Inverse problems are the task of calibrating models to match data. They play a pivotal role in diverse engineering applications by allowing practitioner...
Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis
The Hierarchical Kernel Transformer (HKT) is a multi-scale attention mechanism that processes sequences at L resolution levels via trainable causal down...
Hierarchical Planning with Latent World Models
Model predictive control (MPC) with learned world models has emerged as a promising paradigm for embodied control, particularly for its ability to gener...
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existi...
High-dimensional Adaptive MCMC with Reduced Computational Complexity
We propose an adaptive MCMC method that learns a linear preconditioner which is dense in its off-diagonal elements but sparse in its parametrisation. Du...
High-dimensional Many-to-many-to-many Mediation Analysis
We study high-dimensional mediation analysis in which exposures, mediators, and outcomes are all multivariate, and both exposures and mediators may be h...
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
Frontier coding agents solve complex tasks when given complete context but collapse when specifications are incomplete or ambiguous. The bottleneck is n...
Histopathology Image Normalization via Latent Manifold Compaction
Batch effects arising from technical variations in histopathology staining protocols, scanners, and acquisition pipelines pose a persistent challenge fo...
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often...
HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection
Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing m...
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models
This paper localizes the policy routing mechanism in alignment-trained language models. An intermediate-layer attention gate reads detected content and...
How Coding Agents Work
Deep dive into coding agent architecture: how agents navigate codebases, plan edits, execute changes, and iterate using test feedback.
How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Evaluation With Multi-Hop QA.
How Credible Is an Answer From Retrieval-Augmented L... - published at COLING 2025.
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum
Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewar...
How Python Works Internally
A deep dive into CPython's architecture - from source code to bytecode execution, the GIL, memory management, and the Python object model that every serious Python engineer should understand.
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data
A widely adopted strategy for model enhancement is to use synthetic data generated by a stronger model for supervised fine-tuning (SFT). However, for em...
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Agent skills, which are reusable, domain-specific knowledge artifacts, have become a popular mechanism for extending LLM-based agents, yet formally benc...
HP-Edit: A Human-Preference Post-Training Framework for Image Editing
Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, altho...
HSG: Hyperbolic Scene Graph
Scene graph representations enable structured visual understanding by modeling objects and their relationships, and have been widely used for multiview...
HTTP Deep Dive - What Actually Travels Over the Wire
Master HTTP/1.1 at the byte level - request/response wire format, method semantics, status code families, critical headers, connection pooling, the requests and httpx libraries, HTTP/2 multiplexing, and why every production client needs explicit timeouts.
HTTP/3 and QUIC
Understand HTTP/3 and QUIC - how QUIC solves TCP head-of-line blocking with UDP-based multiplexing, 0-RTT connection establishment, TLS 1.3 integration, and what it means for ML inference serving latency.
HuggingFace Ecosystem
Use the HuggingFace ecosystem end-to-end - transformers, datasets, Trainer API, PEFT/LoRA for efficient fine-tuning, the Hub for sharing models, and tokenizer internals.
HuggingFace Hub and Model Cards
Master the HuggingFace Hub as your primary interface for finding, evaluating, and deploying open-source models. Learn to read model cards, use the Hub API, and navigate 800k+ models efficiently.
Human Evaluation
Design rigorous human evaluation studies for LLMs - from annotation protocols to inter-annotator agreement to Chatbot Arena methodology.
Human Evaluation for Agents
When and how to run human evaluation for agentic systems - annotator selection, rubric design, inter-annotator agreement, crowdsourcing quality control, and closing the feedback loop.
Human Feedback Collection
Collecting preference data, thumbs ratings, and corrections for RLHF pipelines - preference interface design, feedback quality controls, DPO data formats, and ELO-based model ranking.
Human Oversight Mechanisms
Design human oversight that is meaningful, not performative - risk-based interruption, async approval queues, audit trails, and graduated autonomy.
HumanNet: Scaling Human-centric Video Learning to One Million Hours
Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, lea...
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents
We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Visi...
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input mod...
Hybrid Architectures - Jamba and Beyond
How combining attention and Mamba layers creates models that outperform pure architectures - Jamba's design, the attention-to-Mamba ratio, MoE integration, and the emerging hybrid landscape.
Hybrid Graphs for Table-and-Text based Question Answering using LLMs.
Hybrid Graphs for Table-and-Text based Question Answ... - published at NAACL 2025.
Hybrid Policy Distillation for LLMs
Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of d...
Hybrid Search - Dense and Sparse Retrieval
Combine BM25 keyword search with dense vector search using SPLADE, Reciprocal Rank Fusion, and learned sparse models to build retrieval systems that beat pure semantic search.
Hybrid Search and Reranking
How to combine BM25 sparse retrieval with dense vector search using Reciprocal Rank Fusion, and how to apply cross-encoder reranking for precision that neither method achieves alone.
Hybrid Search: Dense and Sparse
Combine BM25 sparse retrieval with dense vector search for best-of-both-worlds performance - understand SPLADE, fusion methods, and when hybrid beats pure dense.
HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closure...
Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport
We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the...
HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents
Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds w...
HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging
Purpose: Proton magnetic resonance spectroscopic imaging ($^1$H MRSI) enables the mapping of whole-brain metabolites concentrations in-vivo. However, a...
Hyperparameter Optimization
Systematic HPO - grid search, random search, Bayesian optimization with Optuna, Hyperband/ASHA pruning, and multi-objective optimization for production ML.
Hypothesis Testing over Observable Regimes in Singular Models
Hypothesis testing in singular statistical models is often regarded as inherently problematic due to non-identifiability and degeneracy of the Fisher in...
I know you are different! Towards Persona Driven Knowledge-infused Dialogue Assistant.
I know you are different! Towards Persona Driven Kno... - published at EACL 2026.
IaC for ML Teams
Why ML teams need Infrastructure as Code - reproducible environments, audit trails, cost control, and eliminating the manual infrastructure chaos that breaks ML at scale.
IaC Patterns for ML Platforms
Production IaC patterns for ML platform engineering - golden paths, blue-green infrastructure, self-destructing experiment environments, OPA policies, GPU quota management, and the internal developer platform model.
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
Key-Value (KV) cache plays a crucial role in accelerating inference in large language models (LLMs) by storing intermediate attention states and avoidin...
Idempotency and Retries
Making LLM-powered workflows robust with idempotency keys, smart retries, distributed deduplication, workflow state persistence, and failure-tolerant pipeline design for production AI systems.
Identifying Causal Effects Using a Single Proxy Variable
Unobserved confounding is a key challenge when estimating causal effects from a treatment on an outcome in scientific applications. In this work, we ass...
Image Generators are Generalist Vision Learners
Recent works show that image and video generators exhibit zero-shot visual understanding behaviors, in a way reminiscent of how LLMs develop emergent ca...
Immutability Strategies - Tuples, Frozen Dataclasses, and Value Objects
Master Python's immutability toolkit at engineering depth - mutable vs immutable types, shallow vs deep immutability, namedtuple, frozen dataclasses, frozenset, MappingProxyType, and the replace/copy pattern for functional state updates. Covers DDD value objects and Redux-style state in Python.
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models
Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior wi...
Import Hooks and the Import System - Intercepting Module Loading
Master Python's import machinery - sys.meta_path finders, loaders, ModuleSpec, lazy imports, AST transformation on import, circular imports, and importlib.metadata for plugin discovery.
Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon...
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
With the increasing accessibility and utilization of multilingual documents, Cross-Lingual Information Retrieval (CLIR) has emerged as an important rese...
In-Context Working Memory
Managing the context window as working memory: token budgeting, sliding windows, summarization, and the lost-in-the-middle problem.
In-Place Test-Time Training
The static ``train then deploy' paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to contin...
Industrial IoT and ML
Learn how to build IIoT data pipelines connecting industrial protocols (OPC-UA, MQTT, Modbus) to time-series databases, Kafka, and ML inference systems for manufacturing intelligence.
Inference Cost Optimization
Reduce LLM inference costs by 60–80% through quantization, intelligent batching, right-sizing, and autoscaling - turning an $80K/month bill into $20K.
Inference Cost Optimization
The economics of LLM inference serving - cost per million tokens, GPU utilization, continuous batching, speculative decoding, KV cache management, and building production systems under $1 per million tokens.
Inference Cost Optimization
Learn how to systematically reduce LLM inference costs using model selection, quantization, caching, request routing, prompt compression, and infrastructure strategies.
Inference Cost Optimization
Reducing ML serving costs at scale - quantization ROI, batching economics, instance right-sizing, caching strategies, and LLM cost-per-token analysis.
Inference Optimization for MoE Models
Production techniques for serving MoE models efficiently - expert caching, CPU offloading, vLLM support, tensor vs. expert parallelism, batch size sensitivity, and quantization strategies.
Inference Scaling
Horizontal and vertical scaling for ML inference - autoscaling policies, KEDA with custom GPU metrics, spot instances, global load balancing, and handling traffic spikes.
Inferential Mechanics Part 1: Causal Mechanistic Theories of Machine Learning in Chemical Biology with Implications
Machine learning techniques are now routinely encountered in research laboratories across the globe. Impressive progress has been made through ML and AI...
InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis
Large language models are emerging as scientific assistants, but evaluating their ability to reason from empirical data remains challenging. Benchmarks...
Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics
Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fun...
Information Gain, Gini Impurity, and Entropy
A deep dive into how decision trees choose splits - Shannon entropy, information gain, Gini impurity, gain ratio, regression variance reduction, and the multi-valued feature bias every practitioner must understand.
Information Router for Mitigating Modality Dominance in Vision-Language Models
Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, whe...
Information-geometric adaptive sampling for graph diffusion
Standard diffusion models for graph generation typically rely on uniform time-stepping, an approach that overlooks the non-homogeneous dynamics of distr...
Infrastructure as Code for ML
IaC for ML infrastructure - Terraform GPU clusters on AWS/GCP/Azure, Helm charts for model serving, Pulumi Python IaC, Ansible for GPU node setup, GitOps with ArgoCD, spot instance handling, and infrastructure cost optimization.
Infrastructure Monitoring for ML Systems
Monitoring the infrastructure layer of ML systems - CPU, GPU, memory, latency, the four monitoring layers, custom ML metrics with Prometheus, and building the observability foundation for model quality monitoring.
Inheritance - Single, Multiple, and Cooperative at Engineering Depth
Master Python inheritance at the engineering level - what inheritance actually does to namespaces, single and multiple inheritance, the MRO algorithm, cooperative super(), the fragile base class problem, isinstance/issubclass, and when inheritance is correct.
Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization
Additive quantization enables extreme LLM compression with O(1) lookup-table dequantization, making it attractive for edge deployment. Yet at 2-bit prec...
InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Reducing the hardware footprint of large language models (LLMs) during decoding is critical for efficient long-sequence generation. A key bottleneck is...
Input Validation and Sanitization
Use Pydantic validators as security boundaries - prevent SQL injection, XSS, path traversal, SSRF, and file upload attacks through structural input validation in FastAPI.
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
Text and faces are among the most perceptually salient and practically important patterns in visual generation, yet they remain challenging for autoregr...
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
Building world models with spatial consistency and real-time interactivity remains a fundamental challenge in computer vision. Current video generation...
Instruction Tuning
How instruction tuning transforms base LLMs into general-purpose assistants that can follow diverse instructions, reason step by step, and generalize to new tasks.
Instruction Tuning at Scale
How to instruction-tune open-source models at production scale - covering the FLAN insight, dataset construction principles, scaling laws for instruction data, multi-node training setup, and a complete pipeline for fine-tuning Llama 3 8B on a 2-node A100 cluster.
Instruction-Guided Poetry Generation in Arabic and Its Dialects
Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speak...
Instruction-Level Optimization
Master ILP, vectorized loads, loop unrolling, and instruction scheduling to extract maximum throughput from CUDA kernels - the techniques separating 31% from 78% peak utilization.
Instructor - Structured Outputs with Pydantic
A complete guide to Jason Liu's Instructor library - Pydantic-based structured extraction, automatic retry on validation failure, multi-provider support, streaming, and production extraction patterns.
Integrated electro-optic attention nonlinearities for transformers
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision...
Intel Gaudi and Habana Labs
Intel Gaudi AI accelerator architecture - Tensor Processor Cores, built-in RoCE scale-out networking, SynapseAI SDK, and price-performance positioning against NVIDIA H100 for LLM training.
Intellectual Property and AI
Patent analysis, prior art search, trademark similarity detection, and the ML systems that support patent prosecution, portfolio management, and IP litigation.
IntentGrasp: A Comprehensive Benchmark for Intent Understanding
Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assista...
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
Robot imitation data are often multimodal: similar visual-language observations may be followed by different action chunks because human demonstrators a...
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?
With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent...
Interleaving Experiments
Use interleaving to compare ranking models with 10-25x better sensitivity than A/B tests - the technique behind fast iteration at search and recommendation companies.
InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or...
Interpretability vs Explainability - Clearing Up the Confusion
The difference between understanding how a model works (interpretability) and explaining a specific prediction (explainability) - and why that distinction shapes regulation, trust, and system design.
InTriage: Intelligent Telephone Triage in Pre-Hospital Emergency Care.
InTriage: Intelligent Telephone Triage in Pre-Hospit... - published at EMNLP 2025.
Introspective Diffusion Language Models
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of intr...
Invariance-Based Dynamic Regret Minimization
We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms...
Inventory Optimization
Newsvendor problem, safety stock optimization, reorder point prediction, multi-echelon inventory, and ML-driven policies that balance stockouts against carrying costs at retail scale.
Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's r...
Inversion-Free Natural Gradient Descent on Riemannian Manifolds
The natural gradient method is widely used in statistical optimization, but its standard formulation assumes a Euclidean parameter space. This paper pro...
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models.
IrokoBench: A New Benchmark for African Languages in... - published at NAACL 2025.
Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder
Training Transformer language models is expensive, as performance typically improves with increasing dataset size and computational budget. Although sca...
Iterative Identification Closure: Amplifying Causal Identifiability in Linear SEMs
The Half-Trek Criterion (HTC) is the primary graphical tool for determining generic identifiability of causal effect coefficients in linear structural e...
Jailbreaks and Adversarial Prompts
How safety training gets bypassed - jailbreak taxonomy, GCG attacks, many-shot jailbreaking, prompt injection, defenses, and why the arms race is hard to win.
Jailbreaks and Bypasses
Taxonomy of jailbreak techniques, why they work, evaluation frameworks, and layered defense strategies for production LLM systems.
JIT Compilation and numba
Just-in-time compilation principles from first principles, numba's LLVM backend and type inference system, GPU kernels with numba CUDA, and when JIT compilation delivers real performance gains.
Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization
We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for lear...
JSON Mode and Tool/Function Schemas
A complete guide to native JSON mode, OpenAI Structured Outputs, tool calling for structured data, Anthropic tool use, parallel tool calls, and schema design best practices.
JSON Serialization - Production-Grade Encoding and Decoding
Master JSON serialization in Python at engineering depth - custom encoders, datetime/Decimal/UUID handling, orjson and msgspec for high-throughput APIs, NDJSON streaming, content negotiation, and why float precision silently destroys financial data.
JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models
Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-...
JWT Authentication
Master stateless JWT authentication - token structure, signing algorithms, refresh token rotation, common pitfalls, and building production-grade FastAPI JWT middleware.
K-Means Clustering
Master K-means clustering - Lloyd's algorithm convergence proof, K-means++ initialization with D² weighting, silhouette analysis, elbow method, Mini-batch K-means for large datasets, and customer segmentation pipelines.
Kafka for ML Systems
Using Apache Kafka as the backbone of production ML systems - schema registry, CDC, exactly-once semantics, and dead letter queues.
Kafka Streams vs Apache Flink - The ML Pipeline Decision Guide
A comprehensive comparison of Kafka Streams, Faust, and Apache Flink for building real-time ML feature pipelines, with a production decision framework and working code examples.
Kernel Bypass and DPDK
Kernel bypass networking for ML clusters - DPDK architecture, RDMA and InfiniBand for GPU-to-GPU communication, NCCL's bypass path, io_uring, eBPF, and when these techniques matter for AllReduce latency.
Kernel Fusion Strategies
How kernel fusion eliminates HBM round-trips between chained GPU operations, how torch.compile and TorchInductor identify fusible patterns, and how to write manual fused kernels with Triton for maximum throughput.
Kernel Integrated $R^2$: A Measure of Dependence
We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced...
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capabili...
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three a...
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning
Robotic systems that interact with the physical world must reason about kinematic and dynamic constraints imposed by their own embodiment, their environ...
Knowledge Distillation for LLMs
Training smaller student models to match larger teacher models - soft labels, temperature scaling, intermediate representation matching, API-based distillation, and a complete production pipeline for task-specific deployment.
Knowledge Graph Embeddings
TransE, RotatE, CompGCN - embedding entities and relations in vector spaces to predict missing facts in knowledge graphs, enabling AI systems to reason about structured world knowledge.
Knowledge Tracing Models
Learn Bayesian Knowledge Tracing (BKT), Deep Knowledge Tracing (DKT), SAKT, and AKT - models that estimate student knowledge state over time from interaction sequences.
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based R...
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
Personalized mobile agents that infer user preferences and calibrate proactive assistance hold great promise as everyday digital assistants, yet existin...
Kolmogorov-Arnold causal generative models
Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data....
KServe and Kubernetes ML Operators
Custom Kubernetes operators for ML workflows - what operators enable, KServe for standardized model serving, Seldon Core, the Kubeflow Training Operator, Argo Workflows, and when to build vs. use existing operators.
Kubeflow Pipelines
Building, compiling, and running production ML pipelines on Kubernetes using Kubeflow Pipelines v2 with MLMD metadata tracking and automatic retraining triggers.
Kubernetes and Auto-Scaling for LLMs
Deploy LLMs on Kubernetes with GPU scheduling, HPA and KEDA for autoscaling, MIG partitioning on A100/H100, and Karpenter for on-demand GPU node provisioning.
Kubernetes for ML
Use Kubernetes as ML infrastructure - from GPU scheduling and device plugins to Kubeflow Pipelines and autoscaling - migrating ML workloads from VMs to K8s without disruption.
Kubernetes Fundamentals for ML Engineers
The minimum Kubernetes knowledge every ML engineer needs to be productive - pods, deployments, services, resource requests, GPU allocation, probes, and persistent volumes.
KV Cache
Learn how the key-value cache eliminates redundant attention computation in LLM inference, and how PagedAttention solves the memory fragmentation problem.
KV Cache Management and PagedAttention
How the KV cache works in transformer inference, why naive memory allocation wastes 60-70% of GPU memory, and how PagedAttention from vLLM solved fragmentation using virtual memory techniques from operating systems.
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
Large Language Models (LLMs) rely heavily on Key-Value (KV) caching to minimize inference latency. However, standard KV caches are context-dependent: re...
KWBench: Measuring Unprompted Problem Recognition in Knowledge Work
We introduce the first version of KWBench (Knowledge Work Bench), a benchmark for unprompted problem recognition in large language models: can an LLM id...
L2GTX: From Local to Global Time Series Explanations
Deep learning models achieve high accuracy in time series classification, yet understanding their class-level decision behaviour remains challenging. Ex...
Lakehouse Architecture for ML
Lakehouse architecture for ML systems - Delta Lake, Apache Iceberg, Apache Hudi, medallion architecture, query engines, and ML pipelines on the lakehouse.
Lakehouse for ML Workflows
Storing training datasets, experiment artifacts, and model outputs in a lakehouse.
Lakehouse Query Engines
Trino, DuckDB, Spark SQL - querying open table formats at scale.
Lambda and Kappa Architecture for ML Systems
Master Lambda and Kappa architecture - the two dominant patterns for building ML systems that handle both historical and real-time data at scale.
Lambda Expressions - Anonymous Functions at Engineering Depth
Understand Python lambda expressions at engineering depth - anonymous function objects, compile-time vs call-time evaluation, the loop-closure trap, late binding, the default-argument fix, and when lambda is and is not appropriate.
LangChain Architecture - REPLACED
replaced
LangChain Deep Dive
A thorough guide to LangChain's core abstractions, LCEL composable pipelines, LangGraph stateful workflows, LangSmith observability, and when to use LangChain vs direct API calls.
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
Continuous diffusion has been the foundation of high-fidelity, controllable, and few-step generation of many data modalities such as images. However, in...
Langfuse - Open-Source LLM Observability
Master Langfuse for production LLM observability - self-hosted tracing, evaluation datasets, prompt management, cost attribution by feature, and full data sovereignty for regulated industries.
LangGraph
LangGraph: stateful graph-based multi-agent systems with checkpointing, human-in-the-loop, streaming, and the supervisor pattern - the most powerful and flexible agent framework.
LangGraph for Stateful Agents
Graph-based stateful agent orchestration with LangGraph - StateGraph, typed state, nodes, conditional edges, checkpointing, and human-in-the-loop.
LangSmith Deep Dive
Master LangSmith for LLM observability - production tracing, dataset curation, evaluation pipelines, prompt versioning, annotation queues, and deployment gating for AI systems.
Language Modeling Objectives
Learn the training objectives that teach LLMs to understand language - causal language modeling, masked language modeling, cross-entropy loss, and perplexity.
Large deviation principles for convolutional Bayesian neural networks
While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known...
Large Language Model Systems
Deploying Llama-3-70B for a 100K DAU application - vLLM serving, tensor parallelism, KV cache management, speculative decoding, LoRA serving, cost management, and RAG integration.
Large Language Models Align with the Human Brain during Creative Thinking
Creative thinking is a fundamental aspect of human cognition, and divergent thinking-the capacity to generate novel and varied ideas-is widely regarded...
Large Language Models Explore by Latent Distilling
Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-le...
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely by...
Large-Scale Memory Optimization
Master the memory math behind training and serving large language models - from mixed precision and gradient checkpointing to ZeRO optimizer stages, KV cache management, and PagedAttention.
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment
While the shortage of explicit action data limits Vision-Language-Action (VLA) models, human action videos offer a scalable yet unlabeled data source. A...
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
Large language models (LLMs) often demonstrate strong safety performance in high-resource languages, yet exhibit severe vulnerabilities when queried in...
Latency and Cost Tradeoffs
How to decompose LLM latency and cost, choose the right optimization strategies, and define SLOs that balance quality, speed, and budget.
Latency vs Throughput Trade-offs in ML Systems
Understanding the fundamental tension between latency and throughput in ML serving - Little's Law, tail latency, batching strategies, and caching for production ML systems.
Latent Diffusion Models - The Architecture Behind Stable Diffusion
How Rombach et al. moved diffusion from pixel space to a compressed latent space via KL-VAE with perceptual and adversarial losses, cross-attention conditioning, and the complete Stable Diffusion pipeline - enabling high-resolution generation on consumer GPUs.
Latent Preference Modeling for Cross-Session Personalized Tool Calling
Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental cha...
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and sub...
Layer Normalization and Residual Connections
How layer normalization and residual connections solve gradient flow in deep transformers and enable training of 100+ layer networks.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gr...
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessa...
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. H...
Learning Evidence Highlighting for Frozen LLMs
Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evide...
Learning interacting particle systems from unlabeled data
Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled...
Learning Long-term Motion Embeddings for Efficient Kinematics Generation
Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scen...
Learning Rate Scheduling
Every major learning rate schedule - step decay, cosine annealing, SGDR warm restarts, linear warmup, 1cycle policy, LR finder - with full PyTorch implementations, the warmup mechanics for Adam, polynomial decay, and a complete selection guide.
Learning Rate Transfer in Normalized Transformers
The Normalized Transformer, or nGPT (arXiv:2410.01131) achieves impressive training speedups and does not require weight decay or learning rate warmup....
Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries
This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the De...
Learning the Signature of Memorization in Autoregressive Language Models
All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibrati...
Learning to Hint for Reinforcement Learning
Group Relative Policy Optimization (GRPO) is widely used for reinforcement learning with verifiable rewards, but it often suffers from advantage collaps...
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies
Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference...
Learning to Rank - Teaching Models to Sort, Not Just Score
How pointwise, pairwise, and listwise ranking approaches train models to produce the optimal ordering of items for search and recommendation.
Learning to Reason with Insight for Informal Theorem Proving
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language...
Learning to Retrieve from Agent Trajectories
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-...
Learning Versatile Humanoid Manipulation with Touch Dreaming
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stabili...
Legal LLM Fine-Tuning
Domain adaptation of LLMs for legal tasks - LegalBench evaluation, instruction tuning on legal data, and building legal AI models that outperform general-purpose LLMs on specific tasks.
Legal Research Automation
Dense retrieval over case law, citation graph analysis, precedent finding, and building legal research AI that surfaces relevant authorities without hallucinating fake cases.
LEMUR: Robust Fine-Tuning for Multilingual Embedding Models for Retrieval.
LEMUR: Robust Fine-Tuning for Multilingual Embedding... - published at EACL 2026.
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and r...
Less Detail, Better Answers: Degradation-Driven Prompting for VQA
Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution...
Leveraging Language-based Representations for Better Solving Symbol-related Problems with Large Language Models.
Leveraging Language-based Representations for Better... - published at COLING 2025.
Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs.
Leveraging LLM-GNN Integration for Open-World Questi... - published at EACL 2026.
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models
Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in...
Lighting-grounded Video Generation with Renderer-based Agent Reasoning
Diffusion models have achieved remarkable progress in video generation, but their controllability remains a major limitation. Key scene factors such as...
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher...
Lightning Unified Video Editing via In-Context Sparse Attention
Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottle...
LightThinker++: From Reasoning Compression to Memory Management
Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In th...
LIME - Local Interpretable Model-Agnostic Explanations
LIME explains any black-box classifier by fitting a local linear approximation around a specific prediction - the algorithm, variants, limitations, and when to use it vs SHAP.
Limitations of Attention at Scale
Why the quadratic complexity of self-attention creates real production bottlenecks - memory, latency, and cost - and why sparse attention approximations only partially solve the problem.
line_profiler and memory_profiler - Line-Level Analysis
Line-by-line time and memory profiling with line_profiler, memory_profiler, tracemalloc, and pympler - finding the exact lines that are slow or leak memory.
Linear Interpolation and Model Soup
How weight averaging of fine-tuned models produces better, more robust models than any individual fine-tune - and the task arithmetic framework for composing capabilities.
Linear Models, Variable Selection, Artificial Intelligence
Variable selection in linear regression models has been a problem since hypothesis testing began. Which variables to include or exclude from a model is...
Linear Regression Internals
Deep dive into linear regression - OLS derivation, normal equations, geometric interpretation as projection, Gauss-Markov theorem, residual diagnostics, Cook's distance, VIF, multicollinearity, and full NumPy implementation.
Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction
The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yi...
Linear-Time Global Visual Modeling without Explicit Attention
Existing research largely attributes the global sequence modeling capability of Transformers to the explicit computation of attention weights, a process...
Linking spatial biology and clinical histology via Haiku
Integrating molecular, morphological, and clinical data is essential for basic and translational biomedical research, yet systematic frameworks for join...
Linting and Formatting - Ruff, Black, isort, and mypy
Master Python code quality tooling at engineering depth - Ruff's rule categories, Black's opinionated formatting, isort profiles, mypy static type checking, pyproject.toml configuration, and how to wire all tools into a coherent developer workflow.
Linux Performance Tuning
Systematic Linux performance tuning for ML workloads - sysctl parameters, CPU governors, NUMA balancing, transparent huge pages, IRQ affinity, NIC tuning, and grub options that matter for training throughput and inference latency.
Linux Process Scheduling
Understand Linux CFS scheduler, nice values, CPU affinity, real-time scheduling, cgroups, NUMA, and how Kubernetes CPU throttling destroys ML training throughput - with concrete fixes.
Lipschitz bounds for integral kernels
Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Li...
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reaso...
LiteLLM
Deploy LiteLLM as a universal LLM proxy supporting 100+ providers. Configure routing, load balancing, fallbacks, semantic caching, and cost tracking through a single OpenAI-compatible endpoint.
LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation
Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and p...
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a nativel...
LLaMA Family Architecture
A deep dive into Meta's LLaMA model family - from LLaMA 1 through LLaMA 3.3 - covering RoPE embeddings, SwiGLU activation, RMSNorm, grouped query attention, and when to choose each variant.
llama.cpp and GGUF Format
llama.cpp - Georgi Gerganov's C++ inference engine that runs quantized LLMs on CPUs and consumer GPUs. GGUF binary format, quantization types, performance tuning, and practical local inference.
LlamaIndex Architecture
LlamaIndex's document-centric agent framework - VectorStoreIndex, QueryEngine, FunctionCallingAgent, and the Workflow event-driven orchestration model.
LlamaIndex Deep Dive
A comprehensive guide to LlamaIndex's data-centric architecture - indices, query engines, workflows, multi-document agents, and how it compares to LangChain for RAG applications.
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented...
LLM as Agent Judge
Using LLMs to evaluate other agents' trajectories and outputs at scale - rubric design, pairwise comparison, bias mitigation, calibration, and escalation logic.
LLM as Data Generator
Use frontier LLMs to generate high-quality instruction-following, reasoning, and preference datasets - sampling strategies, diversity maximization, and quality vs. quantity tradeoffs.
LLM CI/CD
CI/CD pipelines for LLM applications - handling non-deterministic outputs with LLM-judge gates, canary deployments with quality monitoring, automated rollback triggers, and full GitHub Actions implementation.
LLM Evaluation Pipelines
Build automated evaluation pipelines for LLM systems - LLM-as-judge, RAGAS for RAG systems, trajectory evaluation for agents, regression testing, and eval dataset curation.
LLM Gateway and Routing
Design and operate an LLM gateway - unified API, model routing, circuit breakers, budget enforcement, and fallback chains - using LiteLLM and custom routing logic.
LLM Product Architecture
The three fundamental LLM product patterns - chat, workflow automation, and autonomous agents - and how to design the production service graph for each.
LLM Safety From Within: Detecting Harmful Content with Internal Representations
Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal...
LLM-as-Judge
Build calibrated, bias-corrected LLM judges that approximate human judgment at scale - pointwise scoring, pairwise comparison, bias mitigation, and ensemble techniques.
LLM-as-Judge
Use powerful LLMs to automatically evaluate other models - with position bias mitigation, CoT judging, and cost analysis.
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models.
LLM-Coordination: Evaluating and Analyzing Multi-age... - published at NAACL 2025.
LLM-Powered Product Architecture
End-to-end design of a production LLM-powered product - covering the serving stack, prompt management, RAG architecture, multi-LLM routing, streaming, cost management, and observability.
LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation.
LLMInit: A Free Lunch from Large Language Models for... - published at EMNLP 2025.
LLMOps Platforms
Comprehensive guide to LLMOps platforms - LangSmith, Langfuse, W&B Weave, Arize Phoenix, Helicone, and PromptLayer. When to build vs buy, integration patterns, abstraction layers, and production-grade Python examples using the Anthropic SDK.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during infe...
LLVM and MLIR
LLVM compiler infrastructure and MLIR multi-level IR for ML - how they power PyTorch, JAX, TensorFlow, Triton, and IREE, with SSA form, optimization passes, dialect design, and practical code generation for ML workloads.
LM Studio and GUI Tools
LM Studio, Jan.ai, GPT4All, and Open WebUI for running LLMs locally - model discovery, hardware acceleration, local server mode, OpenAI-compatible APIs, and building a complete local AI development workspace.
LMQL and Guidance - Programmatic LLM Control
How Microsoft Guidance and LMQL extend structured generation to full programmatic control - interleaving generation with code, SQL-like constraints, token healing, and when each tool wins over Outlines and Instructor.
Load Balancing Across Providers
Distribute LLM traffic across multiple API keys and providers using round-robin, weighted, least-connections, and latency-based routing to scale throughput beyond single-key limits.
Load Balancing and Request Routing
Load balancing strategies for LLM serving - prefix-aware routing for KV cache reuse, least-connections for variable-cost requests, model routing, circuit breakers, and building a production gateway.
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify unc...
Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of re...
Logging for ML Systems
Structured logging for ML systems - prediction logging for delayed evaluation, structured JSON logs, audit logs for regulated models, log aggregation with Loki and Elasticsearch, and tracing individual prediction failures.
Logistic Regression Deep Dive
Master logistic regression from first principles - sigmoid derivation, log-likelihood to cross-entropy, decision boundary geometry, softmax multiclass, probability calibration with ECE, class imbalance handling, and full NumPy implementation.
Loki: An Open-Source Tool for Fact Verification.
Loki: An Open-Source Tool for Fact Verification. - published at COLING 2025.
Long Context Pre-Training with Lighthouse Attention
Training causal transformers at extreme sequence lengths is bottlenecked by the quadratic time and memory of scaled dot-product attention (SDPA). In thi...
Long-Context Evaluation
Evaluating LLM long-context capability - the Needle in a Haystack test, RULER benchmark, lost-in-the-middle phenomenon, and measuring effective context utilization vs claimed context window size.
LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning
Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent adva...
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
Scaling Transformer-based click-through rate (CTR) models by stacking more parameters brings growing computational and storage overhead, creating a wide...
LoRA for Efficient Fine-Tuning
LoRA and QLoRA: fine-tune 70B models on a single GPU by freezing the base model and training only small low-rank adapter matrices - the technique that democratized LLM customization.
LoRA Mathematics and Implementation
Learn how LoRA (Low-Rank Adaptation) decomposes weight updates into low-rank matrices, why this works mathematically, and how to implement it from scratch in PyTorch and with HuggingFace PEFT.
LoRA: Low-Rank Adaptation
Master LoRA - the parameter-efficient fine-tuning method that adds only 0.3% of parameters to GPT-3 while matching full fine-tuning quality, making LLM fine-tuning feasible on a single GPU.
Lost in the Middle - How LLMs Use Long Contexts
The empirical finding that LLMs reliably recall information at the beginning and end of long contexts but miss information in the middle, and strategies to mitigate this U-shaped performance degradation.
Low-degree Lower bounds for clustering in moderate dimension
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, w...
Low-Latency Feature Serving
Redis, Cassandra, and in-memory stores for sub-millisecond feature retrieval.
Low-Latency Inference Patterns
Engineering ML predictions under 10ms p99 - hardware choices, model optimization, batching strategies, pre-computation, memory layout, and real production targets.
Low-Latency Optimization
Engineering for ultra-low latency inference - NUMA awareness, CPU affinity, memory pre-allocation, lock-free data structures, cache line optimization, zero-copy inference, CUDA streams, and kernel profiling.
Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular v...
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
Recently, scaling reinforcement learning with verifiable rewards (RLVR) for large language models (LLMs) has emerged as an effective training paradigm f...
Low-Resource Guidance for Controllable Latent Audio Diffusion
Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time c...
LPM 1.0: Video-based Character Performance Model
Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Lear...
LSTM and GRU Deep Dive
Master Long Short-Term Memory and Gated Recurrent Units - the architectures that solved vanishing gradients and powered a decade of sequence modeling breakthroughs.
Lyra 2.0: Explorable Generative 3D Worlds
Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, t...
M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data
Causal graph discovery for space-time systems is challenging in high-dimensional gridded data, which often has many more grid cells than temporal observ...
Machine Learning for Health (ML4H) 2024
Machine Learning for Health (ML4H) 2024 — published at ML4H@NeurIPS 2024.
Machine Learning for Health, ML4H@NeurIPS 2024, Vancouver, Canada, 15-16 December 2024
Machine Learning for Health, ML4H@NeurIPS 2024, Vancouver, Canada, 15-16 December 2024 — published at ML4H@NeurIPS 2024.
Make Your LVLM KV Cache More Lightweight
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency...
Mamba - Selective State Space Models
How Mamba's input-dependent SSM parameters, hardware-aware parallel scan, and selective gating mechanism achieved linear-time sequence modeling competitive with transformers.
Mamba vs Transformer - When Each Wins
A rigorous benchmark comparison: perplexity, throughput, recall tasks, in-context learning, and the fundamental trade-off between compressed state and full context access.
ManifoldGD: Training-Free Hierarchical Manifold Guidance for Diffusion-Based Dataset Distillation
In recent times, large datasets hinder efficient model training while also containing redundant concepts. Dataset distillation aims to synthesize compac...
map, filter, reduce - Lazy Iteration and the Pipeline Model
Understand Python's map, filter, and reduce at engineering depth - lazy iterators, pipeline composition, functools.reduce and left-fold semantics, performance trade-offs, and when to prefer list comprehensions.
Mapping the Phase Diagram of the Vicsek Model with Machine Learning
In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional paramet...
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is i...
MARCO: Navigating the Unseen Space of Semantic Correspondence
Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-...
MARS: Enabling Autoregressive Models Multi-Token Generation
Autoregressive (AR) language models generate text one token at a time, even when consecutive tokens are highly predictable given earlier context. We int...
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether larg...
Masked Language Modeling and BERT
Understand how BERT learns bidirectional language representations using masked language modeling, its architecture, and how to fine-tune it for downstream tasks.
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in si...
Matrix Factorization - Discovering Hidden Taste Dimensions
Master matrix factorization for recommendations - SVD, Funk SVD, SGD and ALS optimization, biases, regularization, and implicit feedback with BPR. The algorithm that won the Netflix Prize.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
With the advancement of interactive video generation, diffusion models have increasingly demonstrated their potential as world models. However, existing...
Matryoshka Representation Learning (MRL)
Nested embeddings where any prefix of dimensions is informative - training MRL, adaptive retrieval, 10x FLOP reduction, and how OpenAI's text-embedding-3 uses MRL internally.
Maximum Likelihood Estimation
Understand MLE from first principles - derive OLS from Gaussian noise, cross-entropy from Bernoulli, Fisher information, Cramér-Rao bound, and the deep connection between MLE and empirical risk minimization.
McMining: Automated Discovery of Misconceptions in Student Code.
McMining: Automated Discovery of Misconceptions in S... - published at EACL 2026.
MCP Architecture - Client-Server
Deep dive into MCP's client-server architecture - Host, Client, and Server roles; stdio and HTTP+SSE transport layers; JSON-RPC 2.0 message format; initialization handshake; capability negotiation; and full lifecycle.
MCP Ecosystem and Servers
The growing MCP ecosystem - official Anthropic servers, community landscape, MCP registries, evaluating third-party servers, IDE integrations, and patterns for building ecosystem vs. team-specific servers.
MCP Security and Permissions
Security model of the Model Context Protocol - attack surfaces including tool poisoning, resource injection, and confused deputy attacks, plus permission scoping, transport security, and a production security checklist.
MCP Tools, Resources, and Prompts
Deep dive into MCP's three primitives - Tools (callable functions), Resources (readable data), and Prompts (reusable templates) - with complete Python implementations of each.
MCP vs Function Calling
Deep architectural comparison of MCP and function calling - where each operates, when to use each, the decision matrix, hybrid patterns, and how to migrate from function calling to MCP.
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models.
MCPEval: Automatic MCP-based Deep Evaluation for AI... - published at EMNLP 2025.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
Linear Attention (LA) offers a promising paradigm for scaling large language models (LLMs) to long sequences by avoiding the quadratic complexity of sel...
MDP and the RL Framework
Master Markov Decision Processes - the mathematical foundation of all reinforcement learning. Understand states, actions, rewards, value functions, the Bellman equations, and how real-world systems are modeled as MDPs.
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This oc...
MeanFlow Meets Control: Scaling Sampled-Data Control for Swarms
Steering large-scale swarms in only a few control updates is challenging because real systems operate in sampled-data form: control inputs are updated i...
Measuring AI Product Quality
Build a production-grade quality measurement system for AI products using explicit feedback, implicit behavioral signals, LLM-as-judge, and composite scoring.
Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation
Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek-R1 acknowledges hints 39% of the time), implying tha...
Measuring HITL Effectiveness
End-to-end metrics for human-in-the-loop systems - false positive/negative rates, confidence calibration, inter-rater reliability, reviewer performance tracking, ROI computation, and system-level effectiveness dashboards.
MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts
Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific con...
MedGemma 1.5 Technical Report
We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: hi...
Medical Imaging AI
Deep learning for radiology and pathology - CNN architectures, DICOM pipelines, transfer learning from ImageNet to medical domains, and clinical deployment considerations including FDA clearance.
MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills
Background: Agent skills are increasingly deployed as modular, reusable capability units in AI agent systems. Medical research agent skills require safe...
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping
In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and hi...
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike tr...
Membership Inference
Determining whether specific data was used in model training - privacy risks, attack techniques, and defenses for production ML systems.
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool o...
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later rea...
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
Memory is essential for large vision-language models (LVLMs) to handle long, multimodal interactions, with two method directions providing this capabili...
Memory Allocators for ML
How glibc malloc, jemalloc, tcmalloc, and PyTorch's CUDA caching allocator work - with production techniques for eliminating memory fragmentation in ML training and serving.
Memory Bandwidth Roofline Analysis
Learn to apply the Roofline model to diagnose whether GPU kernels are memory-bound or compute-bound, calculate arithmetic intensity, and use roofline plots to guide real optimization decisions.
Memory Caching: RNNs with Growing Memory
Transformers have been established as the de-facto backbones for most recent advances in sequence modeling, mainly due to their growing memory capacity...
Memory Capacity Planning for LLMs
How to compute exact GPU memory requirements for LLM training and inference - model weights, optimizer states, activations, KV cache - and how to plan GPU cluster configurations for target models.
Memory Coalescing and Bank Conflicts
Master the two most impactful memory access patterns in CUDA - global memory coalescing and shared memory bank conflicts. Understand why identical computation with transposed access can be 8x slower, and how to fix both problems with layout changes and padding.
Memory Compression and Summarization
How to keep agents functional across days-long tasks by compressing memory intelligently - preserving what matters, discarding what does not.
Memory Hierarchy and Cache Design
Learn how CPU cache hierarchy works - L1/L2/L3 structure, associativity, eviction policies, MESI coherence, NUMA topology, and how to write cache-friendly code that runs 10x to 100x faster for ML workloads.
Memory Hierarchy in GPUs
Registers, L1/L2 cache, shared memory, and HBM - GPU memory hierarchy latency numbers, bandwidth characteristics, and how to write code that uses each level effectively.
Memory Intelligence Agent
Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essent...
Memory Models and Concurrency
Hardware memory models, memory barriers, atomic operations, lock-free data structures, and how memory ordering affects concurrent ML data pipelines and distributed training implementations.
Memory Optimization - Fitting More in Less
Reduce Python memory usage with __slots__, weakref, array module, struct.pack, memory-mapped files, object pooling, and the flyweight pattern for processing millions of records.
Memory Profiling - tracemalloc, memory_profiler, objgraph, and pympler
Profile and debug Python memory usage at engineering depth - sys.getsizeof shallow vs deep size, tracemalloc snapshots and leak detection, memory_profiler line-by-line analysis, objgraph retention paths, pympler recursive sizing, and practical workflows for diagnosing real-world memory leaks.
Memory Profiling and Debugging
A systematic toolkit for finding and fixing memory leaks in Python ML systems - from tracemalloc snapshots to GPU memory debugging, DataLoader leaks, and long-running service monitoring.
Memory Safety and Rust
Understand memory safety bugs in C/C++, how Rust's ownership model eliminates them at compile time, and why Rust is becoming the language of choice for high-performance ML infrastructure components.
Memory Systems: Short-Term and Long-Term
Designing memory systems for LLM agents - from in-context working memory to episodic retrieval, semantic knowledge bases, and procedural memory.
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
Memory-based self-evolution has emerged as a promising paradigm for coding agents. However, existing approaches typically restrict memory utilization to...
MergeKit - The Practical Toolkit
How to use arcee-ai/mergekit to merge language models with YAML configuration, CPU-compatible layer-by-layer processing, and automated HuggingFace Hub upload.
Merging and Model Soup Techniques
Combining multiple fine-tuned models without retraining - LoRA adapter merging, SLERP, TIES-merging, DARE, and MergeKit for production model merging that unlocks capabilities no single training run achieves.
Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-ba...
MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments
Motivated by the underspecified, multi-hop nature of search queries and the multimodal, heterogeneous, and often conflicting nature of real-world web re...
Message Passing Neural Networks
MPNN - the unified framework showing GCN, GraphSAGE, and GAT are special cases of a single message-passing paradigm with a fundamental 1-WL expressivity ceiling.
Message Queues and Kafka
Master Apache Kafka for ML data pipelines - topics, partitions, consumer groups, exactly-once semantics, real-time feature computation, prediction logging, and production patterns for ML platforms.
Meta-CoT: Enhancing Granularity and Generalization in Image Editing
Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their...
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural repr...
Meta-Reasoning Improves Tool Use in Large Language Models.
Meta-Reasoning Improves Tool Use in Large Language M... - published at NAACL 2025.
Metaclasses - The Class of Classes
Understand type as the metaclass of all classes, the full class creation pipeline, __new__, __init__, __call__ on metaclasses, __prepare__, metaclass inheritance and conflicts, and real-world usage in Django, SQLAlchemy, and ABC.
Metadata Filtering with Vector Search
Master pre-filtering vs post-filtering, the ACORN algorithm for filtered HNSW, namespace sharding for multi-tenancy, payload index design, and performance impact of filters in vector databases.
Metaflow
Building scalable, reproducible ML workflows with Netflix's Metaflow - the flow-step model, cloud compute with @batch and @kubernetes, and Cards for documentation.
MiA-Signature: Approximating Global Activation for Long-Context Understanding
A growing body of work in cognitive science suggests that reportable conscious access is associated with global ignition over distributed memory systems...
Micro Language Models Enable Instant Responses
Edge devices such as smartwatches and smart glasses cannot continuously run even the smallest 100M-1B parameter language models due to power and compute...
Microservices for ML Systems
Learn when and how to decompose ML systems into microservices - covering feature services, model services, service mesh, gRPC, and circuit breakers.
Microservices vs Monolith - Making the Right Choice
Navigate the monolith-to-microservices spectrum with Python - bounded contexts, communication patterns, the modular monolith, and practical decision frameworks.
Middleware - Wrapping Every Request and Response
Master middleware at engineering depth - WSGI vs ASGI middleware, the onion model, request ID propagation, timing, structured logging, CORS, rate limiting with Redis, JWT authentication, and when to use middleware vs dependency injection.
Mimic Intent, Not Just Trajectories
While imitation learning (IL) has achieved impressive success in dexterous manipulation through generative modeling and pretraining, state-of-the-art ap...
Mind the Gap: Structure-Aware Consistency in Preference Learning
Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Opt...
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
Multimodal large language models (MLLMs) have achieved impressive progress on vision language benchmarks, yet their capacity for visual cognitive and vi...
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored...
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming inter...
Minimax Generalized Cross-Entropy
Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer...
MinShap: A Modified Shapley Value Approach for Feature Selection
Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in t...
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents.
Mirror in the Model: Ad Banner Image Generation via... - published at EMNLP 2025.
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer t...
Mistral and Mixtral Architecture
Mistral 7B's sliding window attention and grouped query attention innovations, and Mixtral 8x7B's Mixture of Experts design - sparse routing, expert selection, and why MoE delivers 70B quality at 13B active parameter cost.
Mitigating Copy Bias in In-Context Learning through Neuron Pruning.
Mitigating Copy Bias in In-Context Learning through... - published at EACL 2026.
Mitigating Multimodal Hallucination via Phase-wise Self-reward
Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existin...
Mixed Precision and Quantization Kernels
Learn how to write correct and fast kernels for FP16, BF16, FP8, INT8, and INT4 quantized models - including the pipeline mistakes that make INT8 slower than FP16.
MixFlow: Mixed Source Distributions Improve Rectified Flows
Diffusion models and their variations, such as rectified flows, generate diverse and high-quality images, but they are still hindered by slow iterative...
Mixtral 8x7B - Architecture Deep Dive
Mistral AI's Mixtral 8x7B architecture - 8 experts with top-2 routing, sliding window attention, multilingual training, performance vs. Llama 2 70B, and serving requirements.
Mixture of Experts Architecture
The architecture of sparse MoE models - how expert networks replace dense FFN layers, top-k routing, and how parameter count relates to active compute.
ML Cost Models
Learn to build a complete ML cost model - from compute and storage to hidden data transfer costs - so your team never gets blindsided by a $300K quarterly cloud bill.
ML Deployment Patterns - From Jupyter Notebook to Production at Scale
A comprehensive guide to ML deployment strategies, serving architectures, optimization techniques, and model registry practices for shipping models safely at scale.
ML Infrastructure Cost Model
Understanding what drives ML costs - building a cost-per-request model for your ML system from scratch, and computing unit economics the CTO will believe.
ML Pipeline Orchestration Concepts
Understand the fundamental concepts behind ML pipeline orchestration - DAGs, dependency management, idempotency, and why cron jobs are a silent disaster for production ML.
ML Platform Design
Designing an internal ML platform for a team of 50 data scientists - feature stores, experiment tracking, model registry, serving infrastructure, and platform adoption strategies.
ML Platform Design
Learn how to design internal ML platforms that enable data scientists and engineers to train, deploy, and monitor models efficiently - covering platform components, build vs buy, and real-world case studies.
ML ROI and Business Cases
Build iron-clad ROI cases for ML investments - from quantifying recommendation system value to attributing A/B test results to long-term business outcomes.
MLflow Deep Dive
Production MLflow setup for teams - tracking server architecture, autologging, custom logging, model registry, nested runs for HPO, and scaling to 500+ experiments per week.
MLflow Model Registry in Production
Learn how to use the MLflow Model Registry to manage model versions, stages, approval workflows, and webhooks for production ML teams.
MLOps Platform Architecture
Understand the MLOps maturity model from Level 0 to Level 3, design the components of a complete ML platform, and build a realistic 12-month roadmap from ad-hoc to automated.
MLOps vs DevOps
How MLOps extends DevOps principles to handle the unique challenges of data, model quality, and concept drift that traditional software CI/CD cannot address.
MLX for Apple Silicon
Apple's MLX framework for running and fine-tuning LLMs on M-series chips - unified memory architecture, lazy evaluation, mlx-lm for inference, LoRA fine-tuning, and benchmarking against llama.cpp.
MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge
Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliabi...
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webp...
MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM)...
MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control
MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorpora...
MMSkills: Towards Multimodal Skills for General Visual Agents
Reusable skills have become a core substrate for improving agent capabilities, yet most existing skill packages encode reusable behavior primarily as te...
MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation
Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual inf...
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization
Mobile GUI agents powered by Multimodal Large Language Models (MLLMs) can execute complex tasks on mobile devices. Despite this progress, most existing...
Mobile GUI Agents under Real-world Threats: Are We There Yet?
Recent years have witnessed a rapid development of mobile GUI agents powered by large language models (LLMs), which can autonomously execute diverse dev...
Mocking - Patch Where the Name Is Used, Not Where It Is Defined
Master Python mocking at engineering depth - the golden patching rule, Mock vs MagicMock, patch as decorator and context manager, autospec, side_effect, AsyncMock, pytest-mock, and the typo that silently passes your tests.
Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We show this is not a failure of encod...
Mode Seeking meets Mean Seeking for Fast Long Video Generation
Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form d...
Model Agreement via Anchoring
Numerous lines of aim to control $ extit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adop...
Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3
Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to a...
Model Cards and Documentation
How to write, automate, and maintain model cards that document model capabilities, limitations, training data, fairness evaluations, and regulatory compliance.
Model Compilation and Optimization
Compiler-level optimizations for ML inference - TensorRT, torch.compile, ONNX export, kernel fusion, layer fusion, XLA, and profiling bottlenecks.
Model Efficiency Economics
Analyze the accuracy-cost Pareto frontier to determine when model improvements are economically justified - and how to build the business case for the current model being cost-optimal.
Model Evaluation Gates
Design automated model quality gates that block promotion when a model fails on demographic subgroups - not just on aggregate metrics.
Model Extraction
Querying a model API to reconstruct its weights, replicate its behavior, or steal proprietary training data through systematic probing.
Model Fallback and Retry
Design resilient LLM clients with configurable fallback chains, exponential backoff with jitter, and circuit breakers that handle provider failures gracefully without any user-facing impact.
Model Licensing and Compliance
Open-source model licenses are not all the same. Learn Apache 2.0, LLaMA Community, RAIL, and custom licenses - what you can and cannot do in production, and how to build a compliance workflow.
Model Monitoring Platform
Build production model monitoring infrastructure that catches data drift, prediction drift, and concept drift - detecting model degradation within 24 hours instead of two months.
Model Performance Monitoring
Monitoring model quality in production - the ground truth delay problem, proxy metrics, shadow evaluation, cohort-based monitoring, SLOs for model quality, and detecting degradation before it hurts the business.
Model Quantization for Production Inference
How quantization reduces model size and inference latency - from FP32 to INT8 to INT4 - covering PTQ, QAT, GPTQ, AWQ, and GGUF with accuracy tradeoffs.
Model Registry and Versioning
Design a model registry that enables 3-minute rollbacks, full model lineage, and controlled staging-to-production promotion - turning model lifecycle management from a manual process into a reliable system.
Model Registry Concepts
Understand what a model registry is, why it exists, and how it brings order to the chaos of managing ML models in production.
Model Rollback Strategies
Designing fast, reliable model rollback procedures for when production models degrade - covering registry-based rollback, infrastructure rollback, and automated rollback controllers.
Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model
In this paper, we study the problem of learning multi-dimensional Gaussian Mixture Models (GMMs), with a specific focus on model order selection and eff...
Model Selection Strategy - Choosing the Right Model for the Right Problem
A systematic framework for selecting model families, managing complexity budgets, tuning hyperparameters, and knowing when AutoML helps versus hurts.
Model Staging and Promotion
How to safely gate model promotion through staging, production, and archiving with automated checks and human approval workflows.
Model Versioning and Canary Releases
Managing model versions in production LLM serving - semantic versioning for models, canary deployments, A/B testing, shadow mode evaluation, rollback procedures, and blue-green model deployments.
Model Versioning Strategies
Design versioning schemes for ML models that support safe rollbacks, A/B testing, champion/challenger management, and backward compatibility.
Modeling Multiple Support Strategies within a Single Turn for Emotional Support Conversations
Emotional Support Conversation (ESC) aims to assist individuals experiencing distress by generating empathetic and supportive dialogue. While prior work...
Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints
Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerab...
Modern Alignment Techniques
Survey the post-RLHF alignment landscape - RLAIF, Constitutional AI, rejection sampling fine-tuning, iterative DPO, process reward models, and the open questions shaping the next generation of aligned models.
MoDora: Tree-Based Semi-Structured Document Analysis System
Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irre...
Module 01: Agentic Foundations
Master the foundational concepts of AI agents - what they are, how they reason, how they act, and when to use them.
Module 01: LLMOps - Overview
An overview of LLMOps - the engineering discipline for building, shipping, and operating production LLM applications reliably and at scale.
Module 01: Systems Foundations
Master the foundational principles of AI system design - from requirements gathering to distributed systems theory applied to machine learning.
Module 01: Transformer Architecture
A complete guide to the transformer architecture - the foundation of every modern large language model.
Module 02 - Functional Programming Overview
Master Python's functional programming model at engineering depth - lambdas, map/filter/reduce, generators, iterators, decorators, closures, pure functions, immutability, functools, and partial application and currying.
Module 02: AI Observability - Overview
An overview of AI observability - tracing, quality metrics, feedback collection, and alerting for production LLM applications.
Module 02: Experiment Tracking
Systematic tracking of ML experiments - hyperparameters, metrics, artifacts, and models - so your team can reproduce results, compare runs, and ship better models faster.
Module 03 - Python Internals Overview
Understand CPython's implementation details at engineering depth - bytecode, the eval loop, the GIL, reference counting, garbage collection, memory profiling, sys/inspect, and the import system.
Module 03: Computer Use Agents
How AI agents see, understand, and interact with graphical interfaces - browsers, desktops, and GUIs - using vision models and action executors.
Module 03: Data Versioning
Versioning datasets as first-class artifacts - DVC, Delta Lake, dataset lineage, data contracts, and managing ML datasets at scale.
Module 03: LLM Gateways
Learn how to build and operate a production LLM gateway - the unified infrastructure layer for routing, caching, cost control, and observability across every AI service your team runs.
Module 03: Model Serving
Production patterns for serving ML model predictions - from protocol choice and batching to quantization, compilation, caching, and autoscaling.
Module 03: Prompt Engineering
Master the art and science of communicating with large language models - from basic zero-shot instructions to automated prompt optimization with DSPy.
Module 04 - Testing and Quality Overview
Build production-grade test suites at engineering depth - unittest, pytest, mocking, TDD, code coverage, linting, and pre-commit hooks that enforce quality at every commit.
Module 04: Coding Agents
Coding agents are the most commercially successful form of agentic AI. Learn how GitHub Copilot, Cursor, Devin, and Claude Code work under the hood.
Module 04: RAG Systems
Master Retrieval-Augmented Generation - the dominant pattern for grounding LLMs in external knowledge at production scale.
Module 04: Real-Time ML Systems
Architecture patterns for real-time machine learning - from sub-10ms inference at scale to online learning, streaming inference pipelines, and ultra-low-latency optimization.
Module 04: Synthetic Data
Learn to generate, filter, and use synthetic training data at scale - from Self-Instruct bootstrapping to Evol-Instruct complexity evolution, distillation datasets, and RAG evaluation corpora.
Module 05: CI/CD for ML
Build CI/CD pipelines that catch ML-specific failures - not just broken code, but broken models.
Module 05: Long-Horizon Planning
How agents decompose complex multi-step tasks, plan across long horizons, recover from failures, and know when to ask for help.
Module 05: ML Architecture Patterns
A deep dive into the architectural patterns that power production ML systems - from Lambda/Kappa to multi-tenant platforms.
Module 06 - APIs and Web Basics
Master HTTP at the wire level, REST design principles, Flask, FastAPI, request/response lifecycle, middleware, JSON serialization, and Pydantic validation - the complete engineering foundation for building production web APIs in Python.
Module 06 - Security Engineering
Master security engineering in Python - cryptographic hashing, JWT authentication, OAuth 2.0, input validation, SQL injection prevention, secrets management, and secure coding patterns that protect production systems from real-world attacks.
Module 06: Agent Memory
How agents store, retrieve, and manage knowledge across interactions - working memory, episodic memory, semantic memory, procedural memory, and cross-session persistence.
Module 06: Case Studies
Real-world end-to-end case studies of production ML systems - recommendation, search, fraud, content moderation, ad click prediction, and LLM-powered products.
Module 06: Containerization
Master Docker and containers for ML - from Dockerfiles to GPU containers, image optimization, and Docker Compose for reproducible ML development environments.
Module 06: LLM Evaluation
A complete guide to evaluating large language models - from perplexity to production monitoring.
Module 07: LLM Inference & Optimization
Master the systems and techniques that make large language model inference fast, efficient, and cost-effective at production scale.
Module 07: Multi-Agent Systems
Orchestration, communication, parallelism, and real frameworks - from first principles to production multi-agent systems.
Module 07: Production AI Patterns
Battle-tested engineering patterns for deploying LLM applications at scale - context management, streaming, async calls, batching, retries, cost optimization, multi-tenancy, and AI product architecture.
Module 08: Agent Evaluation
Evaluation is the most underrated problem in agentic AI. Without it, you cannot improve, catch regressions, or build trust. This module covers trajectory scoring, benchmarks, LLM-as-judge, human evaluation, and production monitoring.
Module 08: AI Product Engineering
Design, build, and ship AI-powered products that users trust - streaming UX, latency management, error handling, rollout strategies, personalization, and quality measurement.
Module 08: Multimodal Models
Understanding how modern AI systems process images, audio, and text together - from CLIP to diffusion to production pipelines.
Module 09: Agent Safety
Risk taxonomy, minimal footprint, prompt injection defense, guardrails, human oversight, sandboxing, and responsible deployment.
Module 09: Human-in-the-Loop
Master human-in-the-loop AI systems - annotation pipelines, active learning, feedback collection, escalation patterns, and measuring HITL effectiveness.
Module 09: LLM System Design
Production architecture for AI-powered products - from prototype to reliable, scalable, cost-efficient systems.
Module 1 - MLOps Foundations
Understand what MLOps is, why it exists, and how to think about operationalizing machine learning systems in production.
Module 1: Computer Architecture for ML Engineers
CPU architecture, memory hierarchy, SIMD vectorization, NUMA, and hardware performance analysis - understanding the machine your ML code runs on.
Module 1: GPU Architecture
How GPUs work at the silicon level - streaming multiprocessors, tensor cores, memory hierarchy, and the roofline model that explains every ML performance optimization.
Module 1: The Open Source LLM Ecosystem
The open source LLM landscape - Llama, Mistral, Qwen, Gemma, Phi, model families, model cards, and a framework for choosing the right model for your task.
Module 10 - AI Platform Engineering
Build the internal platform that lets data scientists ship models to production in days, not months - covering MLOps architecture, experiment tracking, CI/CD for ML, and Kubernetes-native ML infrastructure.
Module 10: Agent Frameworks
LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, raw API - an honest comparison with production lessons.
Module 10: Cloud ML Platforms
Master AWS SageMaker, Google Vertex AI, Azure ML, Databricks, and cloud cost optimization strategies for production ML systems.
Module 10: Reasoning Models
How modern LLMs learn to think - test-time compute, chain-of-thought, process reward models, and the architectures behind o1, o3, and DeepSeek-R1.
Module 11 - A/B Testing and Experimentation
Learn how to design, run, and analyze experiments for ML systems - from statistical foundations to production experimentation platforms.
Module 11: Mixture of Experts
How sparse MoE models achieve massive capacity at lower compute cost - routing mechanisms, load balancing, Mixtral, and DeepSeek's innovations.
Module 12 - LLMOps Pipelines
Operationalize LLM-based systems - prompt management, evaluation pipelines, observability, RAG operations, and fine-tuning infrastructure.
Module 12: State Space Models
A complete map of State Space Models - from the quadratic attention bottleneck to Mamba's selective recurrence, hybrid architectures, and production deployment.
Module 13 - Infrastructure as Code for ML
Master Infrastructure as Code for ML systems - Terraform, Pulumi, GitOps, secret management, and cost optimization through declarative infrastructure.
Module 13: Structured Generation
A complete map of structured generation - from the reliability problem with free-text LLM output to constrained decoding, Outlines, Instructor, JSON mode, and production-grade extraction pipelines.
Module 14 - Feature Engineering
Feature engineering as an MLOps discipline - from raw data to production-grade feature pipelines, stores, and monitoring.
Module 14 Overview - Model Merging
How to combine multiple fine-tuned language models into a single, more capable model without any additional training.
Module 15 - Cost Management for ML
Financial operations for ML systems - understanding costs, optimizing training and inference, cloud FinOps, build vs. buy analysis, and cost attribution.
Module 15 Overview - Long Context Strategies
How modern LLMs handle extremely long inputs - from the fundamental O(n²) attention problem to RoPE scaling, context compression, and production engineering for 128K+ context windows.
Module 16 - Alignment and Safety
A complete guide to AI alignment, RLHF, Constitutional AI, DPO, red teaming, jailbreaks, safety evaluations, and the global regulatory landscape.
Module 17 - Embeddings Engineering
A complete guide to embeddings - models, evaluation (MTEB), fine-tuning, Matryoshka embeddings, quantization, multimodal embeddings, and production pipelines.
Module 2 - Data Infrastructure
A complete map of the Data Infrastructure module covering data lakes, Spark, Kafka, feature stores, data quality, Delta Lake, and lakehouse architecture for ML.
Module 2: AI in Healthcare
Building ML systems under HIPAA constraints and FDA regulation - medical imaging, clinical NLP, drug discovery, and patient outcome prediction.
Module 2: CUDA Programming
Write GPU kernels from scratch - thread hierarchy, memory spaces, coalescing, warp divergence, and profiling with Nsight - the foundation for understanding every ML framework under the hood.
Module 2: Model Context Protocol
A module map of the Model Context Protocol - from core concepts through architecture, primitives, building servers, security, ecosystem, and comparison with function calling.
Module 2: Operating Systems for ML
Virtual memory, process scheduling, huge pages, memory-mapped files, and OS-level tuning - the operating system layer that determines whether your ML workload runs fast or fights the kernel.
Module 2: Running Models Locally
llama.cpp, Ollama, and LM Studio - run any open source model on your own hardware, understand memory requirements, and set up a local development environment.
Module 3: AI in Legal
Contract analysis, legal research automation, compliance monitoring, and document review at scale - building AI where hallucination is malpractice and every output needs a citation.
Module 3: Compilers and Runtimes for ML
How compilers work, JIT compilation, MLIR, XLA, torch.compile, and TensorRT - understanding the compilation stack that turns your Python model into fast machine code.
Module 3: Custom Silicon for AI
TPUs, Trainium, Groq LPU, Cerebras WSE, Intel Gaudi, and Apple Silicon - how each architecture differs from GPUs and what workloads each wins on.
Module 3: LoRA and QLoRA Fine-Tuning
Fine-tune any open source model on your data without owning a data center - LoRA theory, QLoRA 4-bit training, hyperparameter selection, and getting a specialized model into production.
Module 3: Stream Processing for Real-Time AI
Eight lessons covering Apache Kafka, Apache Flink, stream processing patterns, real-time feature computation, and production reliability for ML systems that cannot tolerate batch latency.
Module 4 - Model Registry and Lifecycle
Master the model registry - the system that brings order, traceability, and governance to every model your team ships to production.
Module 4: AI in Retail
Demand forecasting, personalization at scale, dynamic pricing, inventory optimization, and supply chain AI - the ML systems behind recommendations and prices.
Module 4: Kernel Optimization
FlashAttention, Triton, operator fusion, torch.compile, and XLA - making neural network operations faster by understanding what the hardware actually does with your compute.
Module 4: Memory Management for ML
Stack and heap allocation, Python memory model, GPU memory patterns, memory profiling, and zero-copy data transfer - debugging OOM errors and building memory-efficient pipelines.
Module 4: Quantization in Practice
GGUF, GPTQ, AWQ, and bitsandbytes - compress models to fit your hardware budget while understanding exactly what quality you are trading away and why.
Module 5: AI in Manufacturing
Predictive maintenance, computer vision for quality control, digital twins, and process optimization - deploying ML on the factory floor where downtime costs thousands per minute.
Module 5: Fine-Tuning Pipelines
Production fine-tuning with Axolotl - dataset formatting, multi-GPU training, DPO preference tuning, and managing adapter versions across model releases.
Module 5: LLM Agents - Overview
LLM agents as autonomous systems that reason, plan, and act using tools, memory, and multi-agent coordination.
Module 5: Memory Systems for AI
HBM, DRAM, cache hierarchies, KV cache management, PagedAttention, and quantization as memory compression - understanding memory is understanding why LLM inference costs what it costs.
Module 5: Networking for Distributed AI
TCP/IP fundamentals, RDMA, AllReduce algorithms, gRPC for model serving, and network bottlenecks in distributed training - the networking layer that determines whether your training job scales.
Module 6 - AI Security
Comprehensive coverage of AI security threats, attack vectors, and defenses for production AI systems.
Module 6: AI in EdTech
Adaptive learning systems, AI-powered assessment, knowledge tracing, and personalized tutoring - building educational AI that actually improves learning outcomes.
Module 6: Algorithms for ML Engineers
Algorithmic complexity in the context of ML - hash maps for embeddings, approximate nearest neighbor data structures, sampling at scale, and the algorithmic foundations of attention.
Module 6: Distributed Training Hardware
NVLink, InfiniBand, AllReduce algorithms, network topology, fault tolerance, and the hardware that makes training at thousands of GPUs possible.
Module 6: Evaluating Open Models
Build eval suites that give real signal - benchmark contamination, domain-specific evaluation, LLM-as-judge for open models, and regression testing after fine-tuning.
Module 7 - ML Pipeline Orchestration
Master the tools and patterns for orchestrating reliable, production-grade ML pipelines using Airflow, Prefect, Kubeflow, ZenML, and beyond.
Module 7 - Vector Database Engineering
Master vector similarity search, ANN algorithms, embedding pipelines, hybrid search, and production vector database deployment.
Module 7: Inference Hardware
Hardware selection for inference workloads - cost-per-token analysis, batching tradeoffs, edge hardware, speculative decoding implications, and building a complete inference stack.
Module 7: Production Deployment of Open Models
vLLM, Text Generation Inference, multi-adapter serving, autoscaling, and cost analysis - deploying open source models at production scale.
Module 7: Systems Programming for ML Engineers
C++ basics for ML engineers, Python C extensions, Cython, Pybind11, and writing custom PyTorch operators - bridging the gap between Python ML code and high-performance native implementations.
Module 8 - GPU and TPU Infrastructure
Master GPU architecture, memory management, distributed training, fault-tolerant clusters, TPU workloads, inference hardware, and cost optimization for ML infrastructure.
Module 8 - Kubernetes for ML
A complete guide to running machine learning workloads on Kubernetes, from fundamentals to GPU scheduling, training jobs, model serving, Helm, and multi-tenant clusters.
Module 9 - Cost & FinOps for AI
Master AI infrastructure economics - from cost modeling to FinOps culture - so you can build powerful systems without burning your budget.
Module 9 - Monitoring and Observability
Complete ML monitoring and observability - data drift detection, model performance monitoring, Prometheus/Grafana for ML, distributed tracing, alerting, and production monitoring tools like EvidentlyAI and NannyML.
MolmoAct2: Action Reasoning Models for Real-world Deployment
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter...
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with t...
Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data
Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the...
Monitoring and Debugging Fine-Tuning
How to monitor LLM fine-tuning runs and debug failures - tracking loss curves, gradient norms, GPU utilization, MFU, and diagnosing NaN loss, overfitting, and OOM errors in LoRA and full fine-tuning.
Monitoring LLM Services
Production observability for LLM serving systems - GPU metrics, TTFT, inter-token latency, vLLM Prometheus integration, distributed tracing, alerting, and Grafana dashboards.
Monitoring ML Serving in Production
Production monitoring for ML serving - inference latency histograms, GPU metrics, throughput monitoring, error rates, distributed tracing with OpenTelemetry, and drift detection.
Monte Carlo and Observability Platforms
Monte Carlo, Bigeye, and Soda - managed data observability.
Monte Carlo Tree Search for LLM Reasoning
Adapting MCTS to language model reasoning - selection, expansion, simulation, backpropagation over reasoning steps, AlphaCode 2, Tree-of-Thought, and production trade-offs.
MoRight: Motion Control Done Right
Generating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two...
Motion-Aware Caching for Efficient Autoregressive Video Generation
Autoregressive video generation paradigms offer theoretical promise for long video synthesis, yet their practical deployment is hindered by the computat...
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation
Recent Speech-to-Speech Translation (S2ST) systems achieve strong semantic accuracy yet consistently strip away non-verbal vocalizations (NVs), such as...
MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection
In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic s...
MRO - Method Resolution Order and the C3 Linearisation Algorithm
Understand Python's Method Resolution Order at engineering depth - the diamond problem, C3 linearisation step by step, how super() traverses the MRO (not just "calls parent"), mixin patterns that depend on MRO, Django/Flask examples, and MRO failure cases.
MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective...
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared...
MULSUM: A Multimodal Summarization System with Vis-Aligner and Diversity-Aware Image Selection.
MULSUM: A Multimodal Summarization System with Vis-A... - published at EACL 2026.
Multi-Agent Architectures
Building systems where multiple specialized LLM agents collaborate through orchestrator-worker, pipeline, and peer-to-peer patterns using LangGraph and CrewAI.
Multi-Armed Bandits
Use multi-armed bandit algorithms to adaptively allocate traffic during experiments - learning faster than A/B tests while reducing regret.
Multi-Cloud Data Strategies for AI Workloads
What multi-cloud data architectures do for AI systems, when vendor lock-in and data gravity risks threaten the portability of ML training and serving infrastructure, and how to design resilient multi-cloud strategies for production AI data pipelines.
Multi-GPU Training Architectures
Master data parallelism, tensor parallelism, pipeline parallelism, and 3D parallelism for large-scale model training - with communication volume math, PyTorch DDP vs FSDP, and Megatron-LM weight splitting strategies.
Multi-Head Attention
How multi-head attention enables transformers to jointly attend to information from multiple representation subspaces simultaneously.
Multi-Model Serving
How to serve hundreds of models efficiently - model multiplexing, ensembles in production, A/B testing infrastructure, shadow mode, canary deployments, and multi-tenant GPU resource isolation.
Multi-Model Serving Architecture
Serving multiple LLMs from shared infrastructure - model routing, MIG partitioning, dynamic loading, LiteLLM proxy, cost optimization through bin-packing, and autoscaling per model in production.
Multi-Task Learning Systems
How production ML systems share representations across multiple objectives simultaneously - covering hard vs soft parameter sharing, loss balancing, gradient conflicts, and negative transfer detection.
Multi-Task Pre-Finetuning of Lightweight Transformer Encoders for Text Classification and NER.
Multi-Task Pre-Finetuning of Lightweight Transformer... - published at EMNLP 2025.
Multi-Tenant AI Systems
Isolating context, costs, and data across tenants in multi-tenant AI products.
Multi-Tenant ML Platforms
Learn how to design ML platforms that safely serve multiple teams from shared GPU infrastructure - covering Kubernetes isolation, fair scheduling, data isolation, cost attribution, and quota management.
Multi-User Large Language Model Agents
Large language models (LLMs) and LLM-based agents are increasingly deployed as assistants in planning and decision making, yet most existing systems are...
Multicore and NUMA Architecture
Learn how multicore CPUs and NUMA topology affect ML workload performance - cache coherence overhead, CPU affinity, NUMA-aware memory allocation, hyperthreading, and configuring PyTorch DataLoader for optimal hardware utilization.
Multilingual Self-Taught Faithfulness Evaluators.
Multilingual Self-Taught Faithfulness Evaluators. - published at EACL 2026.
Multimodal Embeddings
CLIP, SigLIP, ImageBind, ColPali, and CLAP - embedding images, text, audio, and documents in shared vector spaces for cross-modal search and zero-shot classification.
Multimodal Open Source Models
How open-source vision-language models work - from CLIP vision encoders and projection layers to LLaVA, InternVL2, and LLaMA 3.2 Vision - and how to deploy them for document understanding, OCR, and visual reasoning in production.
Multimodal RAG
How to build retrieval-augmented generation systems that can retrieve and reason over images, PDFs with figures, slides, and mixed-media documents.
Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs
Multimodal LLMs can accurately perceive numerical content across modalities yet fail to perform exact multi-digit multiplication when the identical unde...
Multivariate Spatio-Temporal Neural Hawkes Processes
We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed...
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as...
MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy
Modern microscopy routinely produces gigapixel images that contain structures across multiple spatial scales, from fine cellular morphology to broader t...
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelera...
Narrative Media Framing in Political Discourse.
Narrative Media Framing in Political Discourse. - published at ACL 2025.
Narrative-Driven Paper-to-Slide Generation via ArcDeck
We introduce ArcDeck, a multi-agent framework that formulates paper-to-slide generation as a structured narrative reconstruction task. Unlike existing m...
NCCL and Collective Communication
Deep dive into NCCL internals - the five collective operations, ring-allreduce algorithm, tree-reduce for small tensors, algorithm selection heuristics, tuning environment variables, and diagnosing collective hangs in production GPU clusters.
Near-Future Policy Optimization
Reinforcement learning with verifiable rewards (RLVR) has become a core post-training recipe. Introducing suitable off-policy trajectories into on-polic...
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mi...
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning.
Nemotron-CrossThink: Scaling Self-Learning beyond Ma... - published at EACL 2026.
Network Debugging for Distributed Training
Master distributed training network debugging - NCCL error diagnosis, AllReduce communication patterns, bandwidth testing with iperf3 and nccl-tests, RDMA diagnostics, and profiler-based timeline analysis for PyTorch DDP.
Network Security for ML Platforms
Comprehensive network security for ML infrastructure - mTLS service authentication, Kubernetes network policies, eBPF with Cilium, secrets management with Vault, zero-trust networking, and ML-specific threats including model theft and prompt injection.
Neural Collaborative Filtering - Beyond the Dot Product
How deep learning revolutionized recommendations by replacing the linear dot product with learnable nonlinear interactions between users and items.
Neural Computers
We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unli...
Neural Diffusion Intensity Models for Point Process Data
Cox processes model overdispersed point process data via a latent stochastic intensity, but both nonparametric estimation of the intensity model and pos...
Neural Operators Can Discover Functional Clusters
Operator learning is reshaping scientific computing by amortizing inference across infinite families of problems. While neural operators (NOs) are incre...
Neuro-Symbolic ODE Discovery with Latent Grammar Flow
Understanding natural and engineered systems often relies on symbolic formulations, such as differential equations, which provide interpretability and t...
NLP for Educational Content
Learn readability scoring, educational NER, automatic summarization, curriculum alignment, concept map generation, and question difficulty estimation for educational NLP pipelines.
NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transforme...
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabil...
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint acti...
NormAL LoRA: What is the perfect size?
NormAL LoRA: What is the perfect size? - published at EMNLP 2025.
Normalizing Trajectory Models
Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a...
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because g...
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR
Reinforcement learning with verifiable rewards (RLVR) has emerged as a scalable paradigm for improving the reasoning capabilities of large language mode...
Numerical and Categorical Features
Systematic feature engineering for tabular data - transformations, encoding, imputation, and selection that lifted AUC from 0.71 to 0.84.
NumPy for ML
Master NumPy for machine learning - broadcasting, vectorization, linear algebra, memory layout, einsum, and the performance patterns every ML engineer needs.
OAuth 2.0 and OIDC
Implement OAuth 2.0 authorization code flow with PKCE, OpenID Connect ID tokens, Keycloak integration, and delegated authorization in FastAPI with authlib.
Object Detection: YOLO and R-CNN
Two-stage and one-stage object detection architectures - from sliding windows and R-CNN to Faster R-CNN, YOLO v8, FPN, anchor boxes, NMS, IoU, and mAP - with full PyTorch implementations.
Observability and Logging
Observability for ML systems - structured logging with structlog, distributed tracing with OpenTelemetry, Prometheus metrics for inference servers, Grafana dashboards, ML-specific alerting, and production profiling.
Observability for LLM Apps
Build production observability for LLM applications - distributed tracing, quality metrics, cost attribution, prompt versioning, and drift detection using LangSmith, Langfuse, and Helicone.
Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint
In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly presum...
Observationally Informed Adaptive Causal Experimental Design
Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data...
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor safety...
Occupancy and Thread Block Tuning
How GPU occupancy works, what limits it, and how to tune thread block size and register usage to maximize SM utilization without falling into the 100% occupancy trap.
OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models
The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has...
Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation.
Odysseus Navigates the Sirens' Song: Dynamic Fo... - published at ACL 2025.
Offline vs Online Evaluation - Why Your AUC Goes Up But Revenue Goes Down
A deep dive into offline and online evaluation strategies, A/B testing fundamentals, sample size calculation, interleaving, and the root causes of the offline-online metric gap.
Offline vs. Online Evaluation
Design an evaluation strategy that bridges static datasets and production signals - A/B testing, shadow evaluation, implicit signals, and the evaluation flywheel.
Ollama and Local Model Management
Ollama - Docker-like CLI for running and managing local LLMs. Modelfile format, REST API, OpenAI-compatible endpoints, Python integration, and building a complete local AI stack.
OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation
Cross-embodiment video generation aims to transfer motions across different humanoid embodiments, such as human-to-robot and robot-to-robot, enabling sc...
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
To extend the reinforcement learning post-training paradigm to omni-modal models for concurrently bolstering video-audio understanding and collaborative...
OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
Current multimodal large language models (MLLMs) have demonstrated remarkable capabilities in short-form video understanding, yet translating long-form...
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditio...
On Semiotic-Grounded Interpretive Evaluation of Generative Art
Interpretation is essential to deciphering the language of art: audiences communicate with artists by recovering meaning from visual artifacts. However,...
On the Global Photometric Alignment for Low-Level Vision
Supervised low-level vision models rely on pixel-wise losses against paired references, yet paired training sets exhibit per-pair photometric inconsiste...
On the Reliability of Computer Use Agents
Computer-use agents have rapidly improved on real-world tasks such as web navigation, desktop automation, and software interaction, in some cases surpas...
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
Decoder-only large language models (LLMs) are increasingly replacing BERT-style architectures as the backbone for dense retrieval, achieving substantial...
On the Step Length Confounding in LLM Reasoning Data Selection
Large reasoning models have recently demonstrated strong performance on complex tasks that require long chain-of-thought reasoning, through supervised f...
One-Shot Generative Flows: Existence and Obstructions
We study dynamic measure transport for generative modelling in the setting of a stochastic process $X_\bullet$ whose marginals interpolate between a sou...
ONE-SHOT: Compositional Human-Environment Video Synthesis via Spatial-Decoupled Motion Injection and Hybrid Context Integration
Recent advances in Video Foundation Models (VFMs) have revolutionized human-centric video synthesis, yet fine-grained and independent editing of subject...
OneHOI: Unifying Human-Object Interaction Generation and Editing
Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as <person, action, object> triplets. E...
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature im...
Online Controlled Experiments
Design valid ML experiments by choosing the right randomization unit, handling network effects, detecting novelty, and managing holdout sets.
Online Feature Computation for Model Serving
How to compute ML features at request time without blowing your latency budget - caching strategies, vectorized computation, and production patterns.
Online Learning
Continuous learning in production - online learning vs mini-batch, concept drift adaptation, Vowpal Wabbit, streaming gradient descent, bandit algorithms, and preventing catastrophic forgetting.
Online Quantile Regression for Nonparametric Additive Models
This paper introduces a projected functional gradient descent algorithm (P-FGD) for training nonparametric additive quantile regression models in online...
Online vs Offline Features
The fundamental split between pre-computed offline and real-time online features.
Open LLM Leaderboard and Benchmarks
Understanding the HuggingFace Open LLM Leaderboard, what each benchmark actually measures, how contamination distorts scores, and how to use leaderboard numbers to make real deployment decisions.
Open Political Corpora: Structuring, Searching, and Analyzing Political Text Collections with PoliCorp.
Open Political Corpora: Structuring, Searching, and... - published at EMNLP 2025.
OpenAI Embeddings and API-Based Embedding Services
text-embedding-3, Matryoshka training, Voyage AI, Cohere Embed, cost analysis, batch processing patterns, and when to choose API vs self-hosted embeddings.
OpenAI o1 and o3 - Architecture and Training
What we know about OpenAI's o1 and o3 reasoning models - hidden chain-of-thought, reinforcement learning from process rewards, compute budget tokens, and ARC-AGI results.
OpenAI Swarm
OpenAI's experimental multi-agent framework: agents, handoffs, context variables, and the triage pattern. What it gets right and wrong.
OpenGame: Open Agentic Coding for Games
Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real...
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achievi...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence v...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated...
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific d...
OpenTelemetry for AI Systems
Apply OpenTelemetry to AI and LLM applications - GenAI semantic conventions, auto-instrumentation, OTel Collector routing, sampling strategies, context propagation through async queues, and multi-backend production setups.
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal La...
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remain...
Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction
Online Conformal Prediction (CP) struggles to balance temporal adaptability and structural stability. Feedback-driven methods (e.g., Adaptive Conformal...
Optimization Algorithms Deep Dive
Optimization algorithms in depth - SGD, momentum, Nesterov, AdaGrad, RMSProp, Adam derivation, AdamW, learning rate schedules, second-order methods, convergence theory, and why Adam beats SGD for transformers.
Optimized Deferral for Imbalanced Settings
Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational c...
Optimizers: Adam, SGD, RMSProp
Complete optimizer guide - SGD momentum, Nesterov, AdaGrad, RMSProp, Adam bias correction derivation, AdamW decoupled weight decay, LAMB, Lion, AMSGrad - with NumPy Adam from scratch, PyTorch implementations, and the SGD vs Adam generalization debate.
Optimizing ML Docker Images
Reduce ML Docker images from 8GB to under 1.5GB using multi-stage builds, slim bases, BuildKit cache mounts, and image scanning.
Orchard: An Open-Source Agentic Modeling Framework
Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn in...
Orchestration Patterns for End-to-End ML Pipelines
What dynamic DAGs, sensors, and fan-out/fan-in patterns do for AI systems, when ML workflows require data-aware scheduling and conditional branching across training and serving stages, and how to apply these patterns in production AI data pipelines.
Orchestrator-Subagent Pattern
The most reliable multi-agent pattern: one orchestrator plans, subagents execute. Deep dive into task decomposition, assignment strategies, and production-grade implementation.
OT on the Map: Quantifying Domain Shifts in Geographic Space
In computer vision and machine learning for geographic data, out-of-domain generalization is a pervasive challenge, arising from uneven global data cove...
Out-of-distribution transfer of PDE foundation models to material dynamics under extreme loading
Most PDE foundation models are pretrained and fine-tuned on fluid-centric benchmarks. Their utility under extreme-loading material dynamics remains uncl...
Outlines - Grammar-Constrained Generation
A complete guide to the Outlines library - Pydantic schema to FSM, regex constraints, JSON schema constraints, vLLM integration, and production deployment patterns with guaranteed output conformance.
Overload and Type Narrowing
Use @overload for multiple function signatures and TypeGuard, TypeIs, assert_never, and pattern matching for exhaustive type narrowing in Python.
Overview
Overview of cloud data platforms for AI and ML workloads.
Overview
Module overview for Pipeline Orchestration - turning ad-hoc scripts into reliable, observable, recoverable production data pipelines.
Overview
Overview of real-time feature engineering for low-latency ML systems.
p1: Better Prompt Optimization with Fewer Prompts
Prompt optimization improves language models without updating their weights by searching for a better system prompt, but its effectiveness varies widely...
Packaging and Environments - Module Overview
Master Python packaging and environments at full engineering depth - virtual environments, pip and lockfiles, pyproject.toml, Poetry, semantic versioning, and publishing to PyPI for production-grade projects.
Packaging Projects - Overview
Overview of hands-on projects for Module 05 - Packaging and Environments. Build, test, version, and publish a real Python utility package from scratch.
PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control
Large vision-language models have significantly advanced GUI agents, enabling executable interaction across web, mobile, and desktop interfaces. Yet the...
Pandas for ML
Pandas for machine learning engineers - DataFrame operations, missing data, groupby feature aggregation, time series, memory optimization, and building leakage-free feature matrices.
Panoptic Pairwise Distortion Graph
In this work, we introduce a new perspective on comparative image assessment by representing an image pair as a structured composition of its regions. I...
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant...
Paper Espresso: From Paper Overload to Research Insight
The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-sourc...
Parallel Agent Execution
Running agents concurrently with asyncio, worker pools, DAG-based scheduling, rate limiting, and cost/speed tradeoffs in parallel multi-agent systems.
Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback
Estimating how well a person performs an action, rather than which action is performed, is central to coaching, rehabilitation, and talent identificatio...
ParamMem: Augmenting Language Agents with Parametric Reflective Memory
Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent...
ParamSpec and Concatenate
Solve the decorator typing problem with ParamSpec and Concatenate -- preserve callable signatures through wrappers, type retry/logging decorators, and apply patterns from FastAPI middleware.
Parcae: Scaling Laws For Stable Looped Language Models
Traditional fixed-depth architectures scale quality by increasing training FLOPs, typically through increased parameterization, at the expense of a high...
Partial Application and Currying - functools.partial, operator, and Function Pipelines
Master partial application and currying at engineering depth - functools.partial internals, inspecting partial objects, the distinction between partial application and currying, implementing currying in Python, the operator module as curried-style operations, function composition with reduce, and real-world usage in Django ORM, sorted(), and data pipelines.
Partition Function Estimation under Bounded f-Divergence
We study the statistical complexity of estimating partition functions given sample access to a proposal distribution and an unnormalized density ratio f...
Patient Outcome Prediction
Building clinical prediction models for hospital readmission, ICU mortality, and sepsis onset - feature engineering from EHR data, LSTM models for vital sign time series, survival analysis, calibration, and deployment in clinical workflows.
PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination
Patent examination is a complex, multi-stage process requiring both technical expertise and legal reasoning, increasingly challenged by rising applicati...
PCA Dimensionality Reduction
Principal Component Analysis via eigendecomposition and SVD - covariance geometry, reconstruction error, Kernel PCA, Incremental PCA, whitening, and production use for preprocessing and anomaly detection.
PCIe and NVLink Interconnects
Host-to-device PCIe bandwidth, GPU-to-GPU NVLink and NVSwitch, the interconnect hierarchy in multi-GPU systems, and how interconnect bandwidth shapes model parallelism strategies.
PCIe and NVLink Interconnects
Understand PCIe bandwidth limitations for CPU-GPU data transfer, NVLink for high-speed GPU-to-GPU communication, NVSwitch topology in DGX systems, and how to design systems that avoid interconnect bottlenecks in multi-GPU AI training.
pEBR: A Probabilistic Approach to Embedding Based Retrieval.
pEBR: A Probabilistic Approach to Embedding Based Re... - published at EMNLP 2025.
Perceptron and MLP
From the McCulloch-Pitts neuron to multi-layer perceptrons - the mathematical foundations of deep learning, XOR proof, universal approximation, forward pass mechanics, depth vs width theory, and full NumPy and PyTorch implementations.
Perceptual Flow Network for Visually Grounded Reasoning
Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories,...
Perplexity and Language Model Metrics
Understand perplexity, cross-entropy, bits per byte, and when intrinsic metrics mislead you about model quality.
Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents.
Persona-SQ: A Personalized Suggested Question Genera... - published at NAACL 2025.
PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents
Personalizing language models by effectively incorporating user interaction history remains a central challenge in the development of adaptive AI system...
Personalisation and Memory
User preference learning, conversation memory architecture, and personalised AI experiences that persist across sessions.
Personalization at Scale
Two-tower retrieval models, real-time feature serving, ANN search, and the full ML architecture that powers personalized recommendations for hundreds of millions of retail users.
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a centr...
Personalized Tutoring AI
Learn how to build AI tutoring systems using Socratic dialogue, LLM-based hint generation, worked example fading, affective state detection, and multi-session context management.
Personalizing Text-to-Image Generation to Individual Taste
Modern text-to-image (T2I) models generate high-fidelity visuals but remain indifferent to individual user preferences. While existing reward models opt...
PersonaVLM: Long-Term Personalized Multimodal LLMs
Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual pr...
Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics
Recent advances in generative video modeling, driven by large-scale datasets and powerful architectures, have yielded remarkable visual realism. However...
Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models
We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourier coe...
Phi and Small Language Models
Microsoft Phi model family - textbook quality data hypothesis, how 1-4B models can match much larger ones on reasoning tasks, and the design principles behind efficient small language models.
Phoenix by Arize - LLM Observability with Embedding Analysis
Master Arize Phoenix for open-source LLM observability - UMAP embedding visualization, drift detection, RAG coverage gap analysis, OpenTelemetry-native tracing, and LLM evaluation pipelines in production.
PhyCo: Learning Controllable Physical Priors for Generative Motion
Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebou...
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation
Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has dri...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on...
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments
We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record...
Physics Informed Viscous Value Representations
Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value esti...
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Buildi...
PianoCoRe: Combined and Refined Piano MIDI Dataset
Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources oft...
pip and requirements - Dependency Management in Practice
Master pip and requirements files at full engineering depth - dependency resolution, version specifiers, pip-tools lockfiles, layered requirements, hash verification, supply-chain security, and private package indexes for production workflows.
Planning and Reasoning
How LLM agents handle complex multi-step tasks through plan-and-execute, hierarchical planning, self-reflection, and LangGraph-based workflows.
PlayCoder: Making LLM-Generated GUI Code Playable
Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains...
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialog...
PledgeTracker: A System for Monitoring the Fulfilment of Pledges.
PledgeTracker: A System for Monitoring the Fulfilmen... - published at EMNLP 2025.
Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction
Plug-and-Play diffusion prior (PnPDP) frameworks have emerged as a powerful paradigm for solving imaging inverse problems by treating pretrained generat...
Plugin Systems - Building Extensible Applications
Build extensible Python applications with entry_points, importlib.metadata, stevedore, __init_subclass__, and plugin lifecycle management.
PLUME: Latent Reasoning Based Universal Multimodal Embedding
Universal multimodal embedding (UME) maps heterogeneous inputs into a shared retrieval space with a single model. Recent approaches improve UME by gener...
Pods, Deployments, and Services - Deep Dive
Master the three core Kubernetes workload primitives for ML engineers - stateless serving with Deployments, traffic routing with Services, and advanced pod patterns for ML.
POEMetric: The Last Stanza of Humanity
Large Language Models (LLMs) can compose poetry, but how far are they from human poets? In this paper, we introduce POEMetric, the first comprehensive f...
Poetry - Dependency Management and Packaging Done Right
Master Poetry at engineering depth - lockfile mechanics, version constraints, dependency groups, virtualenv management, publishing, and CI integration for reproducible Python builds.
Point-in-Time Correctness
Time-travel queries, point-in-time joins, and preventing data leakage.
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
While Vision-Language Models (VLMs) have achieved remarkable progress in static visual understanding, their deployment in complex 3D embodied environmen...
PokeRL: Reinforcement Learning for Pokemon Red
Pokemon Red is a long-horizon JRPG with sparse rewards, partial observability, and quirky control mechanics that make it a challenging benchmark for rei...
Policy Gradient Methods
Directly optimize policies with gradient ascent - REINFORCE derivation, the log-derivative trick, variance reduction with baselines, actor-critic, A2C/A3C, and entropy regularization. The foundation for PPO and RLHF.
Policy-Aware Design of Large-Scale Factorial Experiments
Digital firms routinely run many online experiments on shared user populations. When product decisions are compositional, such as combinations of interf...
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation
Synthesizing supervised finetuning (SFT) data from language models (LMs) to teach smaller models multilingual tasks has become increasingly common. Howe...
Polynomial Features and Kernel Methods
Extend linear models to nonlinear patterns - polynomial basis expansion, curse of dimensionality, Mercer's theorem for valid kernels, RBF kernel via infinite-dimensional feature space, kernel ridge regression dual form, Nyström and random Fourier features for scalability.
Pooling, Strides, and Padding
Why spatial downsampling exists, how max pooling and strided convolutions compare, how padding controls output dimensions, receptive field growth, dilated convolutions, transposed convolutions, and when to use each - with PyTorch examples.
Portkey
Use Portkey as a managed LLM gateway with built-in observability, virtual keys, guardrails, request tracing, feedback collection, and automated fallbacks across Claude, GPT-4o, and 250+ providers.
POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP
Recent work has explored optimizing image signal processing (ISP) pipelines for various tasks by composing predefined modules and adapting them to task-...
Position-Aware Depth Decay Decoding (D³): Boosting Large Language Model Inference Efficiency.
Position-Aware Depth Decay Decoding (D³): Boosting L... - published at ACL 2025.
Position: agentic AI orchestration should be Bayes-consistent
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool...
Positional Encoding
How positional encodings inject sequence order information into transformers - from sinusoidal to RoPE.
Post-Training Quantization Methods
A practical guide to PTQ methods for LLMs - GPTQ, AWQ, SmoothQuant, bitsandbytes, GGUF, and HQQ compared by accuracy, speed, memory, and production use case.
Power one sequential tests exist for weakly compact $\mathscr P$ against $\mathscr P^c$
Suppose we observe data from a distribution $P$ and we wish to test the composite null hypothesis that $P\in\mathscr P$ against a composite alternative...
PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction
Three-dimensional medical image data and computer-aided decision making, particularly using deep learning, are becoming increasingly important in the me...
Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost
Commercial TTS systems produce near-native Indic audio, but the best open-source bases (Chatterbox, Indic Parler-TTS, IndicF5) trail them on measured ph...
Pre-Commit Hooks - Automate Quality Gates Before Every Commit
Master the pre-commit framework at engineering depth - Git hook mechanics, .pre-commit-config.yaml structure, building production hook pipelines with ruff, black, mypy, detect-secrets, and pytest, CI integration, team adoption strategy, and hook performance tuning.
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?
Unlike code completion, debugging requires localizing faults and applying targeted edits. We observe that frontier LLMs often regenerate correct but ove...
Predicting integers from continuous parameters
We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-v...
Prediction-powered Inference by Mixture of Experts
The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture,...
Predictive Coding Graphs are a Superset of Feedforward Neural Networks
Predictive coding graphs (PCGs) are a recently introduced generalization to predictive coding networks, a neuroscience-inspired probabilistic latent var...
Predictive Maintenance with AI
Learn how AI systems predict equipment failures before they happen using sensor data, feature engineering, anomaly detection, and remaining useful life prediction.
Prefect
Building and deploying production ML workflows using Prefect 2.x/3.x - flows, tasks, deployments, work pools, and observability.
Prefect and Modern Orchestration
Prefect orchestration deep dive - flows, tasks, deployments, work pools, automations, and a direct comparison with Apache Airflow.
Prescriptive Scaling Laws for Data Constrained Training
Training compute is increasingly outpacing the availability of high-quality data. This shifts the central challenge from optimal compute allocation to e...
Pretraining at Scale
The infrastructure, parallelism strategies, memory optimizations, and training data choices required to pretrain large language models on thousands of GPUs.
PRIM-cipal components analysis
Supervised No Free Lunch Theorems (NFLTs) are well studied, yet unsupervised NFLTs remain underexplored. For elliptical distributions, we prove that the...
Prior-Aligned Data Cleaning for Tabular Foundation Models
Tabular Foundation Models (TFMs) achieve state-of-the-art zero-shot accuracy on small tabular datasets by meta-learning over synthetic data-generating p...
PRISM: LLM-Guided Semantic Clustering for High-Precision Topics
In this paper, we propose Precision-Informed Semantic Modeling (PRISM), a structured topic modeling framework combining the benefits of rich representat...
Privacy and Air-Gapped Deployment
Deploying LLMs in air-gapped environments without internet access - pre-downloading models, offline HuggingFace usage, regulatory compliance, and architecture for privacy-critical AI.
Privacy and Ethics in Synthetic Data
Copyright exposure, memorization risks, differential privacy, bias auditing, terms-of-service compliance, and the governance processes required for defensible synthetic data pipelines.
PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research
The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current sci...
Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration
Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroim...
Probing the Geometry of Diffusion Models with the String Method
Understanding the geometry of learned distributions is fundamental to improving and interpreting diffusion models, yet systematic tools for exploring th...
Probing Visual Planning in Image Editing Models
Visual planning represents a crucial facet of human intelligence, especially in tasks that require complex spatial reasoning and navigation. Yet, in mac...
Problem-Solving Logic Guided Curriculum In-Context Learning for LLMs Complex Reasoning.
Problem-Solving Logic Guided Curriculum In-Context L... - published at ACL 2025.
Procedural Memory and Learned Skills
How agents store and reuse successful action sequences: skill formation, retrieval, composition, and refinement from execution feedback.
Proceedings of Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning @ COLING 2025.
Proceedings of Bridging Neurons and Symbols for Natu... - published at COLING 2025.
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation.
Proceedings of Context and Meaning: Navigating Disag... - published at COLING 2025.
Proceedings of the 5th Celtic Language Technology Workshop.
Proceedings of the 5th Celtic Language Technology Wo... - published at COLING 2025.
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal).
Proceedings of the Joint Workshop of the 9th Financi... - published at COLING 2025.
Process Optimization with Reinforcement Learning
Learn how to formulate manufacturing process control as an MDP, design safe reward functions, use offline RL from historical data, and deploy RL policies in production industrial settings.
Process Reward Models (PRMs)
How process reward models provide step-level supervision for reasoning - the Lightman et al. 2023 paper, Math-Shepherd, using PRMs for search, and their limitations.
Processes, Threads, and Coroutines
Learn how processes, threads, and coroutines work at the OS level, and how to choose the right concurrency model for ML workloads - data loading, inference, and async API calls.
Production Agent Monitoring
Monitoring agents in production - task completion metrics, distributed tracing, anomaly detection, alerting, and the production improvement flywheel.
Production Async Architecture
Build production-grade async systems with error handling strategies, graceful shutdown, health checks, backpressure, async testing with pytest-asyncio, and structured logging.
Production Lessons
12 hard-won lessons from deploying agentic systems at scale - each with a war story, a principle, and a code pattern you can use today.
Production Monitoring for LLMs
Build a comprehensive production monitoring stack for LLMs - latency, cost, quality drift, safety, and observability platforms compared.
Production Multimodal Systems
Build and operate multimodal AI pipelines at production scale - image preprocessing, cost control, VLM hallucination mitigation, caching, security, and observability for vision-language workloads.
Production Patterns
Case studies in real-time feature engineering from Uber, Twitter, and LinkedIn.
Profiling Python and C Code
Master the complete profiling toolkit - cProfile, line_profiler, py-spy, Scalene, Valgrind, and PyTorch Profiler - to find and eliminate bottlenecks in Python and ML training code.
Profiling Strategy - Measure Before You Optimize
Amdahl's law, the profiling workflow, identifying hotspots, benchmarking methodology with timeit, performance budgets, and the discipline of measuring before optimizing.
Profiling with Nsight
Learn how to use Nsight Systems and Nsight Compute to find GPU performance bottlenecks, read roofline charts, interpret warp stall reasons, and use the PyTorch profiler to guide real optimization decisions.
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-...
Project 01 - Publish an Internal Utility Package
Build, test, version, and publish pyutils-engineersofai - a typed Python utility library with src/ layout, hatchling build backend, full pytest coverage, CHANGELOG, and GitLab CI pipeline that publishes on v* tags.
Prometheus and Grafana for ML
Building production ML observability infrastructure - Prometheus architecture, custom ML metrics, PromQL for ML, Grafana dashboard design for model serving, and scaling with Thanos for long-term storage.
Prompt Debugging Methodology
Systematic methodology for diagnosing and fixing prompt failures - isolation, ablation, root cause analysis, and building a regression test suite.
Prompt Design Fundamentals
Master the first principles of prompt engineering - clarity, specificity, task framing, structural markers, and the systematic principles behind effective LLM instructions.
Prompt Injection
How prompt injection attacks work, why they are the most critical AI vulnerability in production, and how to defend against them with layered mitigations.
Prompt Injection and Security
Understand how prompt injection attacks work, why they're hard to defend against, and how to build LLM systems that are resistant to manipulation.
Prompt Injection Defense
Understand prompt injection attack taxonomy, detection strategies, defense layers, and sanitization techniques for production LLM systems.
Prompt Management
Treat prompts as production artifacts - versioning, registry design, testing frameworks, A/B testing prompts, automated optimization with DSPy, and prompt governance.
Prompt Optimization and DSPy
Move beyond manual prompt engineering to automated, evaluation-driven optimization - using APE, OPRO, and DSPy to build LLM pipelines that improve themselves.
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
Video diffusion models have achieved remarkable progress in generating high-quality videos. However, these models struggle to represent the temporal suc...
Prompt Templates and Composition
Build maintainable, production-grade prompt systems with Jinja2 templates, variable injection, modular composition, and reusable prompt libraries.
Prompt UX Patterns
Prompt scaffolding, slash commands, context transparency, and mode switching in production AI interfaces.
Prompt Versioning
Treating prompts as first-class code artifacts - versioning, branching, review gates, A/B testing, and rollback for production LLM prompts. Build a complete prompt registry from scratch.
Prompt Versioning and Management
Treat prompts as code - semantic versioning, A/B testing, rollback strategies, and prompt registries for production LLM systems.
PromptLab: A Collaborative Platform for Prompt Engineering and Dataset Curation.
PromptLab: A Collaborative Platform for Prompt Engin... - published at EACL 2026.
Protecting Language Models Against Unauthorized Distillation through Trace Rewriting
Knowledge distillation is a widely adopted technique for transferring capabilities from LLMs to smaller, more efficient student models. However, unautho...
Protocol and Structural Subtyping
Master typing.Protocol for structural subtyping in Python -- define interfaces based on behavior, compose protocols, make duck typing type-safe, and apply patterns from Django and file-like objects.
Proximal Policy Optimisation - The Algorithm That Runs ChatGPT's RLHF
PPO: the dominant policy gradient algorithm - how clipping the probability ratio prevents destructive policy updates while maintaining the efficiency of on-policy learning.
Pruning and Depth Control
How to prevent decision tree overfitting through pre-pruning parameters, cost-complexity post-pruning, weakest-link pruning, MDL principle, and production-grade tuning strategies.
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
Unified multimodal models (UMMs) were designed to combine the reasoning ability of large language models (LLMs) with the generation capability of vision...
PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech
Standard text-to-speech (TTS) evaluation measures intelligibility (WER, CER) and overall naturalness (MOS, UTMOS) but does not quantify accent. A synthe...
PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology
Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade...
Publishing Packages - From Source to PyPI
Master Python package publishing at engineering depth - sdist vs wheel formats, build backends, TestPyPI workflow, twine and Poetry publishing, API tokens, private registries, and automated CI/CD release pipelines.
Pulumi for ML
Write ML infrastructure in real Python - Pulumi's code-first approach, component resources, Automation API, and testing with pytest for reproducible ML platforms.
Pure Functions - Testability, Memoisation, and the Functional Core Pattern
Master pure functions at engineering depth - same inputs always produce same outputs with no side effects, referential transparency, how to identify and eliminate side effects, the functional core / imperative shell architecture, and why purity unlocks testability, caching, and thread safety.
pyproject.toml - The Modern Python Project Standard
Master pyproject.toml at full engineering depth - PEP 517/518/621 build system specification, build backends, the full project table, optional dependencies, entry points, tool configuration, src layout, dynamic versioning, and building distribution artifacts.
pytest - The Industry-Standard Test Framework
Master pytest at full engineering depth - assertion rewriting via AST transformation, fixtures with scope, conftest.py, parametrize, monkeypatch, capsys, built-in marks, essential plugins, and pyproject.toml configuration for production test suites.
PyTorch DataLoaders and Datasets
Build custom PyTorch Datasets and high-performance DataLoaders - batching, num_workers, pin_memory, samplers, WebDataset for streaming, custom collate_fn, and profiling.
PyTorch Foundations
PyTorch fundamentals for ML engineers - tensors, autograd, nn.Module, device management, reproducibility, mixed precision training, and the computation graph that makes debugging natural.
PyTorch Training Loop
Write production-grade PyTorch training loops - learning rate scheduling, gradient accumulation, mixed precision, checkpointing, early stopping, and debugging.
Q-Learning and SARSA
Model-free temporal difference learning - Q-learning for off-policy control and SARSA for on-policy control. Understand TD vs MC vs DP, convergence conditions, eligibility traces, Double Q-learning, and implement Q-tables in NumPy.
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
MLLMs require high-resolution visual inputs for fine-grained tasks like document understanding and dense scene perception. However, current global resol...
QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration
Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and...
QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite corre...
QLoRA: 4-Bit Fine-Tuning
Learn how QLoRA combines 4-bit NF4 quantization, double quantization, and paged optimizers to fine-tune 65B parameter models on a single GPU - covering the math, implementation, and production engineering.
QLoRA: Quantized Low-Rank Adaptation
Learn how QLoRA combines 4-bit quantization with LoRA to fine-tune 65B parameter models on a single consumer GPU, using NF4 quantization, double quantization, and paged optimizers.
Quality Metrics in Production LLM Systems
Define, measure, and operationalize quality metrics for production LLM applications - faithfulness, answer relevance, hallucination rate, coherence, toxicity, BLEU vs LLM-as-judge, SLO definitions, and async evaluation pipelines.
Qualixar OS: A Universal Operating System for AI Agent Orchestration
We present Qualixar OS, the first application-layer operating system for universal AI agent orchestration. Unlike kernel-level approaches (AIOS) or sing...
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
Large Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks,...
Quantization Benchmarking
How to rigorously evaluate quantization quality using perplexity, downstream task accuracy, latency, and memory metrics - and build a complete benchmarking pipeline comparing FP16 vs GPTQ vs AWQ vs NF4.
Quantization Deep Dive
INT8, INT4, NF4, FP8, and block-wise quantization explained from first principles - how floating point becomes integer, what accuracy you lose, and how to tune quantization for production LLM inference.
Quantization Error Debugging
How to diagnose and fix quantization quality degradation - symptoms, root causes, diagnostic tools, and systematic fixes for INT4/INT8 quantized LLMs.
Quantization for Vision Models
How to quantize CNN and ViT vision models and vision-language models - handling batch norm sensitivity, attention outliers, and the strategy of quantizing the LLM backbone while keeping the vision encoder in FP16.
Quantization Hardware Tradeoffs
How INT8, INT4, FP8, and NF4 quantization change memory bandwidth utilization, Tensor Core throughput, and inference latency on real GPUs, including hardware support matrices and production calibration strategies.
Quantization-Aware Training
When post-training quantization is not enough - how QAT simulates quantization noise during training so models learn to be robust to it, covering the straight-through estimator, QLoRA, and BitNet.
Quantization: INT8 and INT4
Master LLM quantization techniques - from LLM.int8() to GPTQ and AWQ - to run large models on commodity hardware without unacceptable quality loss.
Quantum Diffusion Models: Score Reversal Is Not Free in Gaussian Dynamics
Diffusion-based generative modeling suggests reversing a noising semigroup by adding a score drift. For continuous-variable Gaussian Markov dynamics, co...
Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks
Quantum machine learning is a promising field for efficiently learning features of a dataset to perform a specified task, such as classification. Interv...
Query Transformation and HyDE
Master query transformation techniques - HyDE, multi-query retrieval, step-back prompting, query decomposition, and routing - to solve the vocabulary mismatch problem that breaks naive RAG systems in production.
Quotient-Based Posterior Analysis for Euclidean Latent Space Models
Latent space models are widely used in statistical network analysis and are often fit by Markov chain Monte Carlo. However, posterior summaries of laten...
Qwen, DeepSeek, and International Models
Alibaba Qwen and DeepSeek architectural innovations - MLA attention, DeepSeekMoE, multi-token prediction, and how Chinese labs are advancing open-source LLM research.
Qwen3.5-Omni Technical Report
In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor,...
R3PM-Net: Real-time, Robust, Real-world Point Matching Network
Accurate Point Cloud Registration (PCR) is an important task in 3D data processing, involving the estimation of a rigid transformation between two point...
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interac...
Rad-Flamingo: A Multimodal Prompt driven Radiology Report Generation Framework with Patient-Centric Explanations.
Rad-Flamingo: A Multimodal Prompt driven Radiology R... - published at EACL 2026.
RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography
Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT)....
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
We present RADIO-ViPE (Reduce All Domains Into One -- Video Pose Engine), an online semantic SLAM system that enables geometry-aware open-vocabulary gro...
Radiology AI in Production
Deploying radiology AI into clinical workflows - PACS integration, DICOM processing, FDA clearance, worklist prioritization, and monitoring for distribution shift in live hospital environments.
RAG Evaluation
Build rigorous RAG evaluation with RAGAS, TruLens, LLM-as-judge, golden datasets, and production monitoring - measure faithfulness, relevance, and groundedness.
RAG Evaluation and RAGAS
Build a continuous RAG evaluation pipeline using the RAGAS framework - faithfulness, answer relevance, context precision, and context recall - with full production implementations using the Anthropic SDK and automated regression detection.
RAG Evaluation Metrics
Evaluate RAG systems with precision - the RAG triad, RAGAS framework, golden datasets, and retrieval metrics for production pipelines.
RAG Pipeline Ops
Operate RAG pipelines in production - index refresh strategies, chunk strategy updates, embedding drift detection, vector database monitoring, and quality tracking.
RAG System Design
How to design Retrieval Augmented Generation systems for production - from naive RAG to advanced pipelines with chunking strategies, hybrid search, reranking, and RAG evaluation.
RAG vs Long Context - When to Use Each
A rigorous cost, latency, and accuracy comparison of retrieval-augmented generation versus long-context stuffing, with decision frameworks for production use cases.
RAG-Specific Evaluation
Master the full evaluation stack for Retrieval-Augmented Generation systems - covering RAGAS metrics, hallucination type classification, citation accuracy, retrieval precision/recall/nDCG, and production-grade benchmarking with complete Python implementations.
RAGEN-2: Reasoning Collapse in Agentic RL
RL training of multi-turn LLM agents is inherently unstable, and reasoning quality directly determines task performance. Entropy is widely used to track...
RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble...
Random Forests
Master Random Forests from first principles - bagging variance reduction math, feature randomization, OOB error estimation, Extra-Trees, bias-variance decomposition, MDI vs permutation importance, and production deployment patterns.
Randomized Algorithms and Sketching
Randomized algorithms in ML - reservoir sampling for streaming data, Johnson-Lindenstrauss projections, Count-Min Sketch, HyperLogLog, randomized SVD, and locality-sensitive hashing for approximate nearest neighbor search.
Randomized Subspace Nesterov Accelerated Gradient
Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is...
Rate Limiting and Cost Control
Controlling costs and preventing abuse in LLM API serving - token-based rate limiting, Redis token buckets, tenant isolation, cost attribution, budget alerts, and abuse detection.
Rate Limiting and Quotas
Protect your LLM infrastructure from abuse and cost overruns with token bucket rate limiting and sliding window quotas per user, team, and feature - enforced at the gateway before any tokens are consumed.
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference....
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Di...
RAViT: Resolution-Adaptive Vision Transformer
Vision transformers have recently made a breakthrough in computer vision showing excellent performance in terms of precision for numerous applications....
Raw API Agent Patterns
Building production agents with just the Anthropic SDK - the agentic loop, tool handling, context management, cost tracking, and a complete 200-line implementation.
Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
Recovering camera parameters from images and rendering scenes from novel viewpoints have long been treated as separate tasks in computer vision and grap...
RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models
Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the l...
ReAct Agent Pattern
Building LLM agents that interleave reasoning traces and actions in a ReAct loop to solve multi-step tasks with tool grounding.
ReAct Pattern
Learn how to build LLM agents that reason and act by interleaving thought and tool calls - the architectural pattern behind every modern AI assistant.
ReactiveGWM: Steering NPC in Reactive Game World Models
Current game world models simulate environments from a subjective, player-centric perspective. However, by treating the Non-Player Character (NPC) merel...
Real-Time Aggregations
Windowed aggregations, sessionisation, and user behaviour features in real time.
Real-Time Feature Computation for ML Inference
How to build streaming feature pipelines that compute fresh ML features at production scale, including dual-store architecture, training-serving skew prevention, and hot key mitigation.
Real-Time Feature Engineering at Scale
Computing ML features from raw events within milliseconds - Redis patterns, sliding window aggregations, session detection, and Uber's Michelangelo real-time pipeline.
Real-Time Inference Design
Architecture for ML inference at 1M QPS with sub-10ms SLA - synchronous vs async real-time, circuit breakers, fallback models, and timeout budget management.
Real-Time Surrogate Modeling for Personalized Blood Flow Prediction and Hemodynamic Analysis
Cardiovascular modeling has rapidly advanced over the past few decades due to the rising needs for health tracking and early detection of cardiovascular...
REAM: Merging Improves Pruning of Experts in LLMs
Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of...
Reasoning and Math Evaluation
Evaluating LLM mathematical and logical reasoning - GSM8K, MATH, AIME benchmarks, chain-of-thought evaluation, process reward models, self-consistency voting, and measuring multi-step reasoning quality.
Reasoning Knowledge Filter for Logical Table-to-Text Generation.
Reasoning Knowledge Filter for Logical Table-to-Text... - published at COLING 2025.
Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance.
Reasoning-Enhanced Domain-Adaptive Pretraining of Mu... - published at EMNLP 2025.
RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval
We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information. In-context retrieval, which ident...
RECIPE-TKG: From Sparse History to Structured Reasoning for LLM-based Temporal Knowledge Graph Completion.
RECIPE-TKG: From Sparse History to Structured Reason... - published at EACL 2026.
Recommendation Systems at Scale
End-to-end system design for YouTube-scale video recommendation - candidate generation, multi-stage ranking, post-processing for diversity, cold start, and session modeling.
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
Reconstructing non-rigid objects with physical plausibility remains a significant challenge. Existing approaches leverage differentiable rendering for p...
Recovering Hidden Reward in Diffusion-Based Policies
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar ene...
Recursive Maximum Likelihood Estimation for Interacting Particle Systems using Virtual Particles
We study recursive maximum likelihood estimation for stochastic interacting particle systems based on continuous observation of a single particle. In th...
Recursive Multi-Agent Systems
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to...
Red Queen: Exposing Latent Multi-Turn Risks in Large Language Models.
Red Queen: Exposing Latent Multi-Turn Risks in Large... - published at ACL 2025.
Red Teaming AI Systems
Systematic adversarial testing of AI systems - methodology, automated red teaming, documentation, and building a continuous red team program.
Red Teaming LLMs
Systematic adversarial evaluation of language models - manual red teaming, automated red teaming with LLMs, failure taxonomies, and building a production red team process.
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services.
RedOne: Revealing Domain-specific LLM Post-Training... - published at EMNLP 2025.
Reference Counting - How CPython Manages Memory at the C Level
Master CPython's reference counting mechanism at engineering depth - ob_refcnt, sys.getrefcount, ctypes raw refcount access, tp_dealloc, reference cycles, weakref module, and why del x does not immediately destroy an object.
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
We introduce region-specific image refinement as a dedicated problem setting: given an input image and a user-specified region (e.g., a scribble mask or...
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified ca...
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete tr...
Reflective Context Learning: Studying the Optimization Primitives of Context Space
Generally capable agents must learn from experience in ways that generalize across tasks and environments. The fundamental problems of learning, includi...
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation.
Registering Source Tokens to Target Language Spaces... - published at ACL 2025.
Regression Testing for Prompts
Build a production-grade regression testing system for LLM prompts - covering test case design, LLM-as-judge pass/fail evaluation, flaky test detection, caching, differential testing, and CI gates that block regressions before they reach users.
Regular Fourier Features for Nonstationary Gaussian Processes
Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample location...
Regularity of Solutions to Beckmann's Parametric Optimal Transport
Beckmann's problem in optimal transport minimizes the total squared flux in a continuous transport problem from a source to a target distribution. In th...
Regularization - L1, L2, and ElasticNet
Master regularization from first principles - bias-variance decomposition, L2 Bayesian interpretation as Gaussian prior, L1 sparsity via subdifferential geometry, elastic net path algorithms, coordinate descent for LASSO, and cross-validation for lambda selection.
Regularized Online RLHF with Generalized Bilinear Preferences
We consider the problem of contextual online RLHF with general preferences, where the goal is to identify the Nash Equilibrium. We adopt the Generalized...
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-...
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting.
Reinforcement Learning for Aligning Large Language M... - published at NAACL 2025.
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces
As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individu...
Reinforcement Learning via Value Gradient Flow
We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base m...
Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation
For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We...
RemoteZero: Geospatial Reasoning with Zero Human Annotations
Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent pr...
Representation and String Methods - __repr__, __str__, __format__ at Engineering Depth
Master Python's string representation protocol - __repr__ vs __str__, the eval() contract, __format__ for custom f-string specs, __bytes__, the !r !s !a conversion flags, and how great repr() transforms production debugging.
Representation Learning for Spatiotemporal Physical Systems
Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate em...
Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction
Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as...
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models.
Representing the Under-Represented: Cultural and Cor... - published at COLING 2025.
Reproducibility and Auditability in ML Systems
Learn how to build fully reproducible ML systems - covering the reproducibility stack, DVC, MLflow, Docker, seed management, GDPR compliance, and financial model audits.
Reproducibility in ML
Learn the four layers of ML reproducibility - environment, data, code, and model - and how to achieve each in practice with Docker, DVC, MLflow, and seed management.
Request-Response Lifecycle - Every Step From Client to Handler and Back
Trace an HTTP request through its full 15+ step lifecycle - DNS, TCP, TLS, load balancer, reverse proxy, ASGI server, middleware, routing, validation, handler, serialisation, and response - with production debugging techniques.
Requirements and Constraints for ML Systems
How to gather, prioritize, and translate business requirements into technical specifications for ML systems - including latency budgets, SLOs, and ML-specific constraints.
Reranking
Master the two-stage retrieval-reranking architecture - cross-encoders, ColBERT, LLM-as-reranker, Reciprocal Rank Fusion, and production latency budgets.
Responsible Agentic AI
Safety principles, EU AI Act compliance, accountability chains, bias, privacy, red-teaming, and building a safety review process for autonomous agent systems.
Responsible AI and Ethics - Building Systems That Don't Cause Harm
Fairness metrics, bias detection, privacy-preserving ML, model auditing, and the regulatory frameworks every ML engineer must understand.
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversi...
REST Principles - Designing APIs That Don't Break Clients
Master REST at engineering depth - Roy Fielding's six constraints, uniform interface, URL design, HTTP method semantics, status codes, pagination patterns, versioning strategies, RFC 7807 error format, and the Richardson Maturity Model.
REST vs gRPC for ML Model Serving
A production engineer's guide to choosing between REST and gRPC for ML APIs - protocol mechanics, performance trade-offs, and when each wins.
Retail Data Engineering
POS data streams, customer data platform architecture, real-time feature computation with Flink, medallion data lake architecture for retail, privacy compliance, and event streaming pipelines for retail ML.
Rethinking Forward Processes for Score-Based Data Assimilation in High Dimensions
Data assimilation is the process of estimating the time-evolving state of a dynamical system by integrating model predictions and noisy observations. It...
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit t...
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understo...
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
Reasoning-intensive retrieval aims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capabilit...
Rethinking the Diffusion Model from a Langevin Perspective
Diffusion models are often introduced from multiple perspectives, such as VAEs, score matching, or flow matching, accompanied by dense and technically d...
Retrieval Algorithms and ANN
Master the approximate nearest neighbor algorithms powering vector search - HNSW, IVF, IVF-PQ, ScaNN, and DiskANN with parameter tuning and recall-latency trade-offs.
Review Queues and Tooling
Building production review interfaces, priority queues, audit trails, reviewer dashboards, and HITL tooling - from Redis-backed queue management to Label Studio integration.
RevieWeaver: Weaving Together Review Insights by Leveraging LLMs and Semantic Similarity.
RevieWeaver: Weaving Together Review Insights by Lev... - published at NAACL 2025.
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models
We present SemanticQA, an evaluation suite designed to assess language models (LMs) in semantic phrase processing tasks. The benchmark consolidates exis...
Revisiting Gene Ontology Knowledge Discovery with Hierarchical Feature Selection and Virtual Study Group of AI Agents
Large language models have achieved great success in multiple challenging tasks, and their capacity can be further boosted by the emerging agentic AI te...
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multi...
RewardFlow: Generate Images by Optimizing What You Reward
We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward La...
RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models
Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that o...
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation.
RichRAG: Crafting Rich Responses for Multi-faceted Q... - published at COLING 2025.
River-LLM: Large Language Model Seamless Exit Based on KV Share
Large Language Models (LLMs) have demonstrated exceptional performance across diverse domains but are increasingly constrained by high inference latency...
RL for AI Agents - Teaching Models to Act in the World
How RL enables autonomous AI agents: ReAct, tool use, MCTS planning, AlphaCode, SWE-bench, and the emerging agent-RL paradigm powering Claude, GPT-4o, and Gemini.
RL from Human Feedback - How ChatGPT Learned to Be Helpful
The complete RLHF pipeline: supervised fine-tuning, reward model training from human preferences, and PPO fine-tuning - the technique behind InstructGPT, ChatGPT, and Claude.
RL in Production - Where Theory Meets Reality
Engineering challenges of deploying RL: offline RL, reward shaping, safe RL, exploration in production, and real-world case studies from DeepMind, Google, and Netflix.
RLDX-1 Technical Report
While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligen...
RLHF and DPO for Open Models
Learn how to align open-source language models with human preferences using RLHF and the simpler, more stable Direct Preference Optimization (DPO) approach with TRL.
RLHF Deep Dive
A complete technical walkthrough of Reinforcement Learning from Human Feedback - the three-phase pipeline, reward models, PPO, KL penalty, and the limitations that led to newer approaches.
RLHF: Reinforcement Learning from Human Feedback
Understand how RLHF aligns LLMs with human preferences through three phases - SFT, reward model training, and PPO - and why it produced InstructGPT's surprising result that smaller aligned models beat larger unaligned ones.
RNNs and the Vanishing Gradient Problem
How recurrent neural networks process sequential data through shared hidden states, and why vanishing gradients cripple their ability to learn long-range dependencies.
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remai...
Robust Reasoning Benchmark
While Large Language Models (LLMs) achieve high performance on standard mathematical benchmarks, their underlying reasoning processes remain highly over...
Robust support vector model based on bounded asymmetric elastic net loss for binary classification
In this paper, we propose a novel bounded asymmetric elastic net ($L_{baen}$) loss function and combine it with the support vector machine (SVM), result...
Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights
The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterizatio...
Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instab...
Roofline Model and Bottleneck Analysis
Arithmetic intensity, roofline model construction, identifying compute vs memory-bound operations, and using the roofline to guide optimization decisions.
RoPE and ALiBi - Positional Encoding for Long Context
How Rotary Position Embedding encodes relative positions through complex-plane rotations, why ALiBi achieves length extrapolation with linear biases, and why RoPE became the dominant approach for long-context models.
ROSE: An Intent-Centered Evaluation Metric for NL2SQL
Execution Accuracy (EX), the widely used metric for evaluating the effectiveness of Natural Language to SQL (NL2SQL) solutions, is becoming increasingly...
ROSE: Retrieval-Oriented Segmentation Enhancement
Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to thei...
RouteProfile: Elucidating the Design Space of LLM Profiles for Routing
As the large language model (LLM) ecosystem expands, individual models exhibit varying capabilities across queries, benchmarks, and domains, motivating...
Router Mechanisms - How Tokens Get Assigned to Experts
The algorithms that decide which experts process which tokens - linear routing, expert choice, auxiliary load balancing loss, noisy top-k gating, and the Switch Transformer approach.
RTSM: Knowledge Distillation with Diverse Signals for Efficient Real-Time Semantic Matching in E-Commerce.
RTSM: Knowledge Distillation with Diverse Signals fo... - published at NAACL 2025.
RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunA...
Running Vector Databases in Production
Master monitoring, capacity planning, index building strategy, warm-up, disaster recovery, index versioning, gradual rollout, and cost optimization for production vector database operations.
Runtime Type Checking
Validate data at system boundaries using get_type_hints, isinstance limitations, beartype, typeguard, and Pydantic's runtime validation model, and build a custom runtime validator.
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification.
Safe: Enhancing Mathematical Reasoning in Large Lang... - published at ACL 2025.
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments ex...
Safety and Bias Evaluation
Evaluate LLMs for harmful outputs, social bias, hallucination, and jailbreak vulnerability - including red teaming methodology and production monitoring.
Safety and Bias Evaluation
Evaluating open-source models for safety and bias before production deployment - red-teaming, toxicity measurement, demographic bias benchmarks, jailbreak robustness, and building end-to-end safety evaluation pipelines.
Safety and Sandboxing
Safety architecture for computer use agents - threat models, prompt injection, Docker sandboxing, action confirmation gates, logging, and anomaly detection.
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
Recursive self-improvement is moving from theory to practice: modern systems can critique, revise, and evaluate their own outputs, yet iterative self-mo...
Saliency Maps for Vision - What Your CNN Is Actually Seeing
Gradient-based saliency, GradCAM, SmoothGrad, Guided Backpropagation, and Integrated Gradients for explaining computer vision models - with practical code and honest limitations.
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
We study the sample complexity of learning an $ε$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds...
Sampling from Constrained Gibbs Measures: with Applications to High-Dimensional Bayesian Inference
This paper considers a non-standard problem of generating samples from a low-temperature Gibbs distribution with mph{constrained} support, when some o...
Sampling Strategies: Temperature, Top-K, Top-P
Master the sampling algorithms that control LLM output diversity - from greedy decoding to nucleus sampling - and learn when to use each in production.
Sandboxing Agent Environments
Contain the blast radius of any agent failure - process isolation, Docker security hardening, network policy, E2B cloud sandboxes, and escape vector prevention.
SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control
While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy up...
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such age...
Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images
Evaluation of AI systems often requires synthetic test cases, particularly for rare or safety-critical conditions that are difficult to observe in opera...
Scalable Learning of Multivariate Distributions via Coresets
Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistic...
Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts
Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in...
Scaling Laws
Empirical power-law relationships between LLM performance and compute, data, and parameters - from Kaplan (2020) to Chinchilla (2022) and beyond.
Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize re...
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems
Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accu...
Scaling Test-Time Compute for Agentic Coding
Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that c...
Scaling Vector Databases to Billions of Vectors
Architect horizontal sharding, replication, consistent hashing, hot-cold tiering, distributed HNSW, geographic distribution, and backup strategies for production vector databases at billion-vector scale.
ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery
Many disciplines pose natural-language research questions over large document collections whose answers typically require structured evidence, tradition...
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis.
SciClaims: An End-to-End Generative System for Biome... - published at EMNLP 2025.
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into...
Scikit-Learn Pipelines
Build production-grade scikit-learn Pipelines - ColumnTransformer, custom transformers, caching, cross-validation without leakage, hyperparameter search, and model serialization.
SciLT: Long-Tailed Classification in Scientific Image Domains
Long-tailed recognition has benefited from foundation models and fine-tuning paradigms, yet existing studies and benchmarks are mainly confined to natur...
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?
Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly p...
SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation
Incremental Few-Shot (IFS) segmentation aims to learn new categories over time from only a few annotations. Although widely studied in 2D, it remains un...
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
On-policy reinforcement learning has become the dominant paradigm for reasoning alignment in large language models, yet its sparse, outcome-level reward...
Score-Based Generative Models - Diffusion Through the Lens of Score Matching
How Song and Ermon's score matching framework unifies DDPM and enables stochastic differential equations for continuous-time diffusion - the mathematical theory behind modern diffusion models, from score functions and Langevin dynamics through denoising score matching and the SDE unification.
Script-Agnosticism and its Impact on Language Identification for Dravidian Languages.
Script-Agnosticism and its Impact on Language Identi... - published at NAACL 2025.
sDPO: Don't Use Your Data All at Once.
sDPO: Don't Use Your Data All at Once. - published at COLING 2025.
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages.
SeaLLMs 3: Open Foundation and Chat Multilingual Lar... - published at NAACL 2025.
Search and Retrieval Systems
Redesigning an Elasticsearch-only search system with neural search - from BM25 baseline through dense retrieval, learning to rank, query understanding, and search quality evaluation.
Secrets Management
Manage secrets securely with python-dotenv, Pydantic SecretStr, AWS Secrets Manager, HashiCorp Vault, git-secrets, and production credential rotation strategies.
Secure Coding Patterns
Apply defense in depth, least privilege, CORS, rate limiting, CSP headers, dependency auditing with pip-audit, and static analysis with bandit to harden FastAPI applications.
Securing RAG Systems
Attack surfaces unique to RAG architectures - document poisoning, retrieval hijacking, indirect prompt injection, embedding collision, cross-tenant leakage, and defense-in-depth strategies for production RAG deployments.
Seedance 2.0: Advancing Video Generation for World Complexity
Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecesso...
Seeing Fast and Slow: Learning the Flow of Time in Videos
How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to mo...
Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input....
SELDON: Supernova Explosions Learned by Deep ODE Networks
The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory's Legacy Survey of Space...
Selecting GPUs for Training vs Inference
H100 vs A100 vs L40S vs RTX 4090 vs A10G - a practical decision framework for matching GPU specifications to training and inference workload requirements.
Selecting Target Modules and Rank
Which layers to apply LoRA to and what rank to use - two of the most impactful fine-tuning decisions. Covers attention vs FFN targeting, rank selection from r=4 to r=64, RSLoRA, DoRA, LoRA+, and ablation strategies.
Self-Adversarial One Step Generation via Condition Shifting
The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among...
Self-Attention Mechanism
How self-attention computes query, key, and value interactions to capture long-range dependencies between tokens.
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
Current post-training methods in verifiable settings fall into two categories. Reinforcement learning (RLVR) relies on binary rewards, which are broadly...
Self-Distilled Agentic Reinforcement Learning
Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse...
Self-Distilled RLVR
On-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide...
Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks
As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However,...
Self-Execution Simulation Improves Coding Models
A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program e...
Self-Instruct
How the Self-Instruct paper bootstrapped instruction-following datasets from a tiny seed set using GPT-3, enabling the Alpaca era of aligned models - and how to implement it today.
Self-Service ML Platform
Build ML platforms that data scientists actually use - applying product thinking to internal tooling, from user research and notebook-to-production workflows to adoption metrics and guardrails.
Self-Sovereign Agent
We investigate the emerging prospect of self-sovereign agents -- AI systems that can economically sustain and extend their own operation without human i...
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure
AI coding agents have become central to developer workflows, yet every existing solution locks its reasoning capabilities within a specific delivery for...
SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering
The rise of OpenClaw in early 2026 marks the moment when millions of users began deploying personal AI agents into their daily lives, delegating tasks r...
Semantic Caching
Return cached LLM responses for semantically similar queries using embedding-based vector similarity. Cut costs 40–60% by never paying for the same question twice regardless of how it is phrased.
Semantic Memory and Knowledge Graphs
Structured world knowledge for agents: building and querying knowledge graphs with entity extraction, relationship traversal, and hybrid vector+graph retrieval.
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance
This work investigates the fundamental fragility of state-of-the-art Vision-Language Models (VLMs) under basic geometric transformations. While modern V...
Semantic Segmentation
Pixel-wise classification with FCN, U-Net, DeepLab atrous convolutions, encoder-decoder architectures, instance segmentation with Mask R-CNN, and full PyTorch U-Net implementation.
Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guarantee...
Semantic Versioning - The Contract Behind Every Version Number
Master Semantic Versioning at engineering depth - MAJOR.MINOR.PATCH definitions, breaking change classification, Python version specifiers, pre-release ordering, CalVer, changelog discipline, and Git tagging for releases.
Semantics-Aware Caching for Concept Learning
Concept learning is a form of supervised machine learning that operates on knowledge bases in description logics. State-of-the-art concept learners ofte...
Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite.
Semi-automatic Sequential Sentence Classification in... - published at NAACL 2025.
Semi-Supervised Generative Learning via Latent Space Distribution Matching
We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM ope...
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching
Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoisin...
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models.
Sens-Merging: Sensitivity-Guided Parameter Balancing... - published at ACL 2025.
Sentiment Analysis of German Sign Language Fairy Tales
We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels...
Seq2Seq and Encoder-Decoder Architectures
How encoder-decoder networks with attention solve variable-length sequence-to-sequence problems - from machine translation to summarization and code generation.
Sequential Inference for Gaussian Processes: A Signal Processing Perspective
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in it...
Serialization and Data Formats
Master serialization formats for ML systems - Protocol Buffers, Apache Arrow, safetensors, Parquet, HDF5, MessagePack, and pickle - with performance benchmarks, security considerations, and schema evolution strategies.
Service Mesh and Load Balancing
Master service mesh architecture and load balancing for ML serving - Istio, Envoy, traffic management, mTLS, canary deployments, circuit breaking, and Kubernetes networking for production AI systems.
Service Mesh for ML Serving
Use Istio service mesh to manage traffic routing across multiple ML model versions - canary deployments, A/B testing, circuit breakers, and telemetry.
Serving Architectures: REST vs gRPC vs WebSocket
How to choose the right serving protocol for ML models - REST, gRPC, and WebSocket compared across latency, throughput, streaming, and operational complexity.
Sessa: Selective State Space Attention
Modern sequence modeling is dominated by two families: Transformers, whose self-attention can access arbitrary elements of the visible sequence, and str...
SEVerA: Verified Synthesis of Self-Evolving Agents
Recent advances have shown the effectiveness of self-evolving LLM agents on tasks such as program repair and scientific discovery. In this paradigm, a p...
Shadow Deployment for Safe Model Releases
How to validate new ML models on real production traffic without affecting users - traffic mirroring, prediction comparison, and graduation criteria.
Shadow Mode Testing
Run new ML models against live production traffic without affecting users - catching silent failures, latency regressions, and behavioral differences before go-live.
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning
Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small...
SHAP Values - The Unified Theory of Feature Importance
Shapley values from cooperative game theory provide the only provably fair attribution of feature contributions to a model's prediction - and SHAP makes them computationally tractable.
SHARE: Social-Humanities AI for Research and Education
This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal langu...
Sharp Convergence Rates for Masked Diffusion Models
Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, with masked (absorbing-rate) variants emerging...
Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks
We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student s...
Shell Scripting for ML Workflows
Bash scripting for ML engineers - automating training launches, multi-node coordination, GPU monitoring, checkpoint management, parallel data downloads, and writing robust production-grade shell scripts.
Signals and IPC for ML
Unix signals, graceful shutdown patterns, shared memory, pipes, Unix domain sockets, and ZeroMQ for building reliable multi-process ML training and serving systems.
Significance and Stability Analysis of Gene-Environment Interaction using RGxEStat
Genotype-by-Environment (GxE) interactions influence the performance of genotypes across diverse environments, reducing the predictability of phenotypes...
Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks
Tendon drives paired with soft muscle actuation enable faster and safer robots while potentially accelerating skill acquisition. Still, these systems ar...
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in wa...
SIMD and Vectorization
Learn how SIMD instruction sets (SSE, AVX2, AVX-512) enable CPUs to process 8 to 16 floating-point operations per cycle, why NumPy and PyTorch use them by default, and how to write code that compilers can auto-vectorize.
SimpliHuMoN: Simplifying Human Motion Prediction
Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been...
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
Data attribution and valuation are critical for understanding data-model synergy for Large Language Models (LLMs), yet existing gradient-based methods s...
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled c...
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Large language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deploy...
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
As the capability frontier of autonomous agents continues to expand, they are increasingly able to complete specialized tasks through plug-and-play exte...
SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks
Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how...
SkillOS: Learning Skill Curation for Self-Evolving Agents
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interac...
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO
We introduce Skills-Coach, a novel automated framework designed to significantly enhance the self-evolution of skills within Large Language Model (LLM)-...
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient:...
SkVM: Compiling Skills for Efficient Execution Everywhere
LLM agents increasingly adopt skills as a reusable unit of composition. While skills are shared across diverse agent platforms, current systems treat th...
SlackAgents: Scalable Collaboration of AI Agents in Workspaces.
SlackAgents: Scalable Collaboration of AI Agents in... - published at EMNLP 2025.
SLERP - Spherical Linear Interpolation
How spherical linear interpolation provides smoother, geometrically correct blending between two model weight configurations than simple linear averaging.
Small Vision-Language Models are Smart Compressors for Long Video Understanding
Adapting Multimodal Large Language Models (MLLMs) for hour-long videos is bottlenecked by context limits. Dense visual streams saturate token budgets an...
SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for ad...
Snowflake for ML
Snowflake architecture, Snowpark, and ML feature serving from Snowflake.
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs.
SoftCoT: Soft Chain-of-Thought for Efficient Reasoni... - published at ACL 2025.
SOLID Principles in Python - Engineering Patterns for Maintainable Code
Master all five SOLID principles with Python-specific implementations - SRP with module-level decomposition, OCP with typing.Protocol, LSP violations and their consequences, ISP with small focused ABCs, and DIP with constructor injection. Production code examples for each.
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by th...
Sorting and Search for ML
Sorting algorithms and search techniques for ML engineers - from timsort internals and top-k selection to binary search for hyperparameter tuning, FAISS IVF indexes, and beam search with priority queues.
SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
The Platonic Representation Hypothesis posits that neural networks trained on different modalities converge toward a shared statistical model of the wor...
Spark for ML Pipelines
Building production ML feature pipelines with PySpark - window functions, Pandas UDFs, MLlib Pipelines, point-in-time joins, and Delta Lake integration.
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly availa...
Sparse vs Dense Models - Trade-offs
Why MoE gives more capacity per FLOP than dense models, the memory vs. compute trade-off, training efficiency, inference complexity, and when to choose each architecture.
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation
Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preservin...
Spatial Competence Benchmark
Spatial competence is the quality of maintaining a consistent internal representation of an environment and using it to infer discrete structure and pla...
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are in...
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by...
Spec Kit Agents: Context-Grounded Agentic Workflows
Spec-driven development (SDD) with AI coding agents provides a structured workflow, but agents often remain 'context blind' in large, evolving repositor...
Specialized Inference Hardware
Compare AWS Inferentia/Trainium, NVIDIA L4/L40S, edge inference hardware (Jetson, Apple Neural Engine), hardware-specific quantization, and cost-performance tradeoffs for production AI inference.
Spectral Alignment in Forward-Backward Representations via Temporal Abstraction
Forward-backward (FB) representations provide a powerful framework for learning the successor representation (SR) in continuous spaces by enforcing a lo...
Speculative Decoding
How speculative decoding uses a small draft model to generate candidate tokens verified by the large target model in a single forward pass, achieving 2-3x inference speedups without changing output distribution.
Speculative Decoding
Learn how speculative decoding uses a small draft model to generate tokens that a large target model verifies in parallel, achieving 2-3x speedup with no quality loss.
Speculative Decoding for Autoregressive Video Generation
Autoregressive video diffusion is emerging as a promising paradigm for streaming video synthesis, with step distillation serving as the primary means of...
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimiz...
SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion
Although multi-modal learning has advanced point cloud completion, the theoretical mechanisms remain unclear. Recent works attribute success to the conn...
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
Large Audio-Language Models (ALMs) have recently demonstrated remarkable capabilities in holistic audio understanding, yet they remain unreliable for te...
SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data
With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimat...
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Proximal Policy Optimization (PPO) is central to aligning Large Language Models (LLMs) in reasoning tasks with verifiable rewards. However, standard tok...
SPRITE: From Static Mockups to Engine-Ready Game UI
Game UI implementation requires translating stylized mockups into interactive engine entities. However, current 'Screenshot-to-Code' tools often struggl...
Spurious Predictability in Financial Machine Learning
Adaptive specification search generates statistically significant backtests even under martingale-difference nulls. We introduce a falsification audit t...
SQL at Scale for ML Feature Engineering
Writing production SQL for 10-billion-row datasets - partition pruning, window functions, approximate aggregates, BigQuery optimization, DuckDB, and Spark SQL patterns for ML feature preparation.
SQL Injection Prevention
Prevent SQL injection through parameterized queries, SQLAlchemy best practices, ORM safety limits, raw SQL auditing, and defense against UNION, blind, and second-order injection.
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents
Coding agents repeatedly consume long tool observations even though only a small fraction of each observation matters for the next step. We study task-c...
Stable and Steerable Sparse Autoencoders with Weight Regularization
Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary...
StableI2I: Spotting Unintended Changes in Image-to-Image Transition
In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetic...
Stacking and Blending
Master stacking and blending ensemble techniques - out-of-fold meta-learning, data leakage prevention, model diversity, snapshot ensembling, temporal ensembling, Kaggle competition patterns, and production deployment tradeoffs.
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generat...
Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints
The rise of autonomous AI agents suggests that dynamic benchmark environments with built-in feedback on scientifically grounded tasks are needed to eval...
State estimations and noise identifications with intermittent corrupted observations via Bayesian variational inference
This paper focuses on the state estimation problem in distributed sensor networks, where intermittent packet dropouts, corrupted observations, and unkno...
State Space Model Foundations
How control theory's state space models became a competitive sequence modeling architecture - continuous-time SSMs, the S4 paper, HiPPO initialization, and the convolutional/recurrent duality.
StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing
We present StateSMix, a fully self-contained lossless compressor that couples an online-trained Mamba-style State Space Model (SSM) with sparse n-gram c...
Static Analysis and Type Systems
Build type-safe ML codebases using Python type hints, mypy strict mode, pydantic v2 validation, Protocol types, jaxtyping tensor shape annotations, and ruff for fast linting.
Static Analysis in Practice
Configure mypy and pyright for strict mode, gradual typing adoption, type stubs, py.typed markers, CI integration, and strategies for migrating untyped codebases.
Statistical Foundations for A/B Testing
Learn the statistical machinery behind A/B testing - null hypotheses, p-values, power, sample size calculation, and the mistakes that invalidate ML experiments.
Statistical Inference for Score Decompositions
We introduce inference methods for score decompositions, which partition scoring functions for predictive assessment into three interpretable components...
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
Unified multimodal models are envisioned to bridge the gap between understanding and generation. Yet, to achieve competitive performance, state-of-the-a...
Step-level Optimization for Efficient Computer-use Agents
Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfa...
Stochastic and Mini-Batch Gradient Descent
Master SGD and mini-batch gradient descent - gradient noise as implicit regularization, convergence proof sketch with decreasing lr, batch size vs generalization, linear scaling rule, cyclic LR, full PyTorch DataLoader training, and distributed SGD.
Storage Formats for ML Training Data
Why Parquet, Avro, ORC, and Delta Lake exist, how columnar storage enables fast ML pipelines, and how to tune storage formats for maximum throughput and minimum cost.
Storage Hierarchy: SSD and NVMe
Deep dive into SSD and NVMe storage architecture for ML workloads - NAND flash physics, NVMe protocol, io_uring async I/O, memory-mapped datasets, and designing storage systems for large-scale training.
Storage IO for Training Pipelines
How storage IO bottlenecks GPU utilization in ML training, NVMe and distributed filesystem characteristics, data loading patterns with WebDataset and DALI, prefetching strategies, and designing checkpointing that does not stall your cluster.
Strait: Perceiving Priority and Interference in ML Inference Serving
Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. How...
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because...
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play
Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demand strategic planning, probabi...
Stream Processing for ML Systems
Continuous feature computation on unbounded data streams using Apache Flink - windowing, watermarks, state management, and production ML feature pipelines.
Stream Processing Patterns for ML Pipelines
Seven production design patterns for streaming ML pipelines - stream enrichment, stream-stream joins, CDC to feature store, streaming inference, feedback loops, and exactly-once end-to-end.
Stream Processing with Kafka for Real-Time ML
How Apache Kafka and Flink enable real-time ML features - topics, consumer groups, exactly-once semantics, streaming feature computation, and architecture patterns.
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Distillation-based acceleration has become foundational for making autoregressive streaming video diffusion models practical, with distribution matching...
Stream-T1: Test-Time Scaling for Streaming Video Generation
While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video ge...
Stream-to-Feature Pipelines
Computing features from event streams with Kafka and Flink.
Streaming Concepts - Why Batch Fails for Real-Time ML
The fundamental theory of stream processing - event time, processing time, watermarks, windowing, delivery semantics, and backpressure - through the lens of ML systems that cannot afford batch latency.
Streaming Inference
Running ML inference on data streams - Kafka integration, Flink ML, stateful stream processing, windowed feature aggregations, exactly-once inference, and time semantics.
Streaming Multiprocessors
The SM is the fundamental execution unit of every NVIDIA GPU - warp schedulers, register files, shared memory, occupancy, and how thread block configuration determines performance.
Streaming Pipeline Reliability for ML Systems
How to build streaming ML pipelines that survive failures, handle schema changes, implement dead letter queues, replay events, and monitor themselves - so your fraud model never runs on 3-hour-old features again.
Streaming Responses
Implementing and optimizing streaming for real-time LLM response delivery - SSE, chunking strategies, backpressure, tool use streaming, and production patterns for perceived performance.
Streaming Structured Inference with Flash-SemiCRF
Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference...
Streaming UX for LLMs
Server-sent events, streaming tokens, TTFT optimization, and building responsive AI chat interfaces that feel instant even under production load.
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation
Recent advancements in autoregressive transformers have demonstrated remarkable potential for generating artist-quality meshes. However, the token order...
Structural Graph Probing of Vision-Language Models
Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly under...
Structural interpretability in SVMs with truncated orthogonal polynomial kernels
We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated repr...
Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport
Multi-view data analysis seeks to integrate multiple representations of the same samples in order to recover a coherent low-dimensional structure. Class...
Structured Causal Video Reasoning via Multi-Objective Alignment
Human understanding of video dynamics is typically grounded in a structured mental representation of entities, actions, and temporal relations, rather t...
Structured Concurrency with TaskGroup
Master asyncio.TaskGroup for safe concurrent execution, understand why gather() leaks tasks, handle ExceptionGroups, and implement the nursery pattern.
Structured Distillation of Web Agent Capabilities Enables Generalization
Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-An...
Structured Generation in Production
Production-grade architecture for structured generation pipelines - reliability stacks, schema versioning, monitoring, async batching, caching, edge case handling, and complete reference implementations.
Structured Output and JSON Mode
Reliably extract structured data from LLMs using JSON mode, function calling, Pydantic validation, and constrained decoding - the backbone of production LLM pipelines.
Structured Pruning
Remove entire attention heads, MLP neurons, and transformer layers to achieve real hardware latency improvements - with production-grade code for Taylor importance, angular distance layer scoring, iterative recovery, and combined compression pipelines.
Structured Tender Entities Extraction from Complex Tables with Few-short Learning.
Structured Tender Entities Extraction from Complex T... - published at COLING 2025.
Student Performance Prediction
Learn how to build early warning systems for at-risk students, predict dropout and grades, audit for fairness, and design interventions using ML on LMS engagement data.
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable ident...
Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance
One of the dominant paradigms in self-supervised learning (SSL), illustrated by MoCo or DINO, aims to produce robust representations by capturing featur...
SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems
AI coding agents operate in a paradox: they possess vast parametric knowledge yet cannot remember a conversation from an hour ago. Existing memory syste...
Supervised Fine-Tuning
Learn how to adapt pretrained LLMs to specific tasks through supervised fine-tuning - data preparation, hyperparameters, catastrophic forgetting, and evaluation.
Supply Chain AI
Lead time prediction, supplier risk scoring, demand sensing, disruption detection, route optimization, and the ML systems that build resilient and efficient retail supply chains.
Supply Chain Optimization with AI
Learn how AI transforms supply chain management through probabilistic demand forecasting, supplier risk scoring, inventory optimization, disruption detection, and vehicle routing.
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine a...
SVGS: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors
Gaussian Splatting demonstrates impressive results in multi-view reconstruction based on Gaussian explicit representations. However, the current Gaussia...
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context
Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep an...
SWE-bench and Evaluation
How to evaluate coding agents: SWE-bench, SWE-bench Verified, SOTA numbers, failure modes, and building your own evaluation harness.
SWE-bench Verified
SWE-bench Verified is the gold standard for evaluating coding agents on real GitHub issues. Learn the evaluation methodology, Docker harness, failure mode taxonomy, and how to interpret benchmark scores.
SWE-chat: Coding Agent Interactions From Real Users in the Wild
AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in pr...
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
The emergence of 'vibe coding' platforms, where users describe applications in natural language and AI agents autonomously generate full-stack software,...
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation
High-resolution image-to-video (I2V) generation aims to synthesize realistic temporal dynamics while preserving fine-grained appearance details of the i...
Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models
Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challen...
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can...
SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
Language models excel at diagnostic assessments on currated medical case-studies and vignettes, performing on par with, or better than, clinical profess...
Synchronous vs Asynchronous Inference
When to use synchronous versus asynchronous inference patterns for ML systems - queue architectures, streaming, timeout handling, and production trade-offs.
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and or...
Synthetic Data and Self-Improvement
Generating high-quality synthetic training data with LLMs using Evol-Instruct, Self-Instruct, Constitutional AI, rejection sampling, and self-play techniques to build data flywheels without expensive human annotation.
Synthetic Data for RAG
Generating question-answer pairs, evaluation datasets, and retrieval test cases from documents to build, evaluate, and systematically improve RAG systems.
Synthetic data in cryptocurrencies using generative models
Data plays a fundamental role in consolidating markets, services, and products in the digital financial ecosystem. However, the use of real data, especi...
Synthetic Monitoring Environments for Reinforcement Learning
Reinforcement Learning (RL) lacks benchmarks that enable precise, white-box diagnostics of agent behavior. Current environments often entangle complexit...
Synthetic Sandbox for Training Machine Learning Engineering Agents
As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becom...
sys and inspect - Runtime Introspection at Engineering Depth
Master the sys and inspect modules at engineering depth - sys.argv, sys.path, sys.modules cache, sys.settrace, sys._getframe, inspect.signature with all parameter kinds, inspect.getsource, inspect.stack, and how FastAPI, pytest, and click use these modules to build their core features.
System Calls and Linux API
Learn how Linux system calls underpin every ML workload - from dataset loading with mmap to epoll-based inference servers, seccomp sandboxing, and io_uring async I/O.
System Prompts and Context Design
Master the architecture of LLM conversations - how to design system prompts, manage context windows, and build production-grade context management systems.
System Prompts and Personas
Design production-grade system prompts and AI personas - the 6-component anatomy, dynamic context injection, behavioral constraints, tone configuration, and persona stability testing.
t-SNE and UMAP
Non-linear dimensionality reduction with t-SNE and UMAP - crowding problem, KL divergence optimization, perplexity, Barnes-Hut approximation, UMAP topological foundations, and production-safe usage.
T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning
Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite...
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular dat...
TableCoder: Table Extraction from Text via Reliable Code Generation.
TableCoder: Table Extraction from Text via Reliable... - published at ACL 2025.
Tadabur: A Large-Scale Quran Audio Dataset
Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present T...
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
Accurate 3D human keypoints localization is a critical technology enabling robots to achieve natural and safe physical interaction with users. Conventio...
Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishin...
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces sig...
Target Policy Optimization
In RL, given a prompt, we sample a group of completions from a model and score them. Two questions follow: which completions should gain probability mas...
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph
Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented la...
Task-Specific Evaluation Design
Building evaluation suites tailored to your production use case - test set curation, annotation, metric selection, LLM-as-judge, and automated scoring pipelines that actually predict deployment quality.
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
We propose TC-AE, a ViT-based architecture for deep compression autoencoders. Existing methods commonly increase the channel number of latent representa...
TCDA: Thread-Constrained Discourse-Aware Modeling for Conversational Sentiment Quadruple Analysis
Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) needs to capture the complex interrelationships in multiple rounds of dialogues. Exist...
TCP/IP Fundamentals for ML
Master the networking layer that underpins every distributed training run and ML serving system - from TCP handshakes to jumbo frames and congestion control algorithms used in modern GPU clusters.
TDD Principles - Write the Test First, Let Failure Guide the Design
Master Test-Driven Development at engineering depth - the Red-Green-Refactor cycle, the three laws of TDD, worked BankAccount example, test naming, the test pyramid, London vs Detroit schools, and when TDD surfaces design problems before production.
Technical Debt in ML Systems
The seven categories of hidden technical debt unique to machine learning systems - entanglement, hidden feedback loops, pipeline jungles, configuration debt, and how to detect and remediate them.
TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications.
TelAgentBench: A Multi-faceted Benchmark for Evaluat... - published at EMNLP 2025.
Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language
Modern machine learning models are deployed in diverse, non-stationary environments where they must continually adapt to new tasks and evolving knowledg...
TEMPO: Scaling Test-time Training for Large Reasoning Models
Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the r...
Temporal Convolutional Networks (TCNs)
Master Temporal Convolutional Networks - causal and dilated convolutions, receptive field math, residual blocks, and when TCNs outperform LSTMs and Transformers in production sequence modeling.
Temporal Data Requirement for Predicting Unplanned Hospital Readmissions
With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical dat...
Temporal Features for Real-Time ML
Engineering time-based features for real-time ML - recency-weighted features, session features, sliding window aggregations, point-in-time joins, temporal leakage prevention, and clock skew in distributed systems.
Temporally Extended Mixture-of-Experts Models
Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows availa...
Tensor and Pipeline Parallelism
Learn how tensor parallelism splits weight matrices across GPUs and pipeline parallelism splits model layers, enabling inference and training of models too large for a single GPU.
Tensor Core Programming
Program NVIDIA Tensor Cores directly using the WMMA API, MMA PTX instructions, Triton tl.dot(), and CUTLASS - understand activation requirements, shape constraints, and how to diagnose zero Tensor Core utilization.
Tensor Cores and Mixed Precision
How tensor cores accelerate matrix multiply, BF16 vs FP16 vs FP8 vs TF32, mixed precision training implementation, and the performance impact of precision choices.
TensorRT and Inference Optimization
NVIDIA TensorRT compilation pipeline, layer fusion, precision calibration, kernel auto-tuning, and deploying optimized inference engines for production LLM and computer vision workloads.
Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories
We release Terminal Wrench, a subset of 331 terminal-agent benchmark environments, copied from the popular open benchmarks that are demonstrably reward-...
Terraform for ML Infrastructure
Build complete ML platforms with Terraform - GPU clusters, MLflow, EKS, feature stores, and model registries using production-grade HCL modules.
Terraform Fundamentals
Master Terraform core concepts - providers, resources, state management, modules, and the plan/apply lifecycle for building reproducible ML infrastructure.
Test-Driven Agent Loops
The most powerful technique for coding agents: use test output as the ground truth feedback signal. TDD loops, pytest integration, output parsing, and backtracking.
Test-Time Adaptation for EEG Foundation Models: A Systematic Study under Real-World Distribution Shifts
Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet...
Test-Time Compute - Scaling at Inference
The paradigm shift from training-time scaling to inference-time scaling - best-of-N sampling, majority voting, and how spending more compute at inference improves reasoning quality.
Testing and Monitoring Pipelines
Unit testing DAGs, SLA monitoring, and alerting on pipeline failures.
Testing Data Pipelines for ML Correctness
How to test batch ML data pipelines with unit tests, integration tests, and data quality checks - catching label leakage, schema drift, and idempotency bugs before they corrupt your models.
Testing Full Mediation of Treatment Effects and the Identifiability of Causal Mechanisms
In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We p...
Testing ML Code
Build a practical ML test suite from zero - covering the full pyramid from unit tests through model validation without testing everything.
Text Features for ML
Turning text into ML features - from TF-IDF baselines to embedding-based representations that improved e-commerce search NDCG by 18%.
Text-Attributed Graph Learning with Coupled Augmentations.
Text-Attributed Graph Learning with Coupled Augmenta... - published at COLING 2025.
TextLDM: Language Modeling with Continuous Latent Diffusion
Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next st...
TGI and Alternative Serving Frameworks
Compare HuggingFace TGI, Ollama, LiteLLM, Triton Inference Server, and llama.cpp for LLM deployment - feature analysis, performance benchmarks, and when to use each framework.
The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus
LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs)...
The 12-Factor App - Building Deployable Python Apps
Apply the 12-Factor App methodology to Python applications with FastAPI, Docker, and PostgreSQL - covering all 12 factors with production-ready code examples.
The Agent Loop: Observe, Think, Act
Master the Observe-Think-Act loop that drives every AI agent - from the detailed mechanics of each phase to error handling, backtracking, and token management.
The Alignment Problem
Why making AI systems do what we actually want is harder than it looks - the specification problem, Goodhart's Law, reward hacking, and outer vs inner alignment.
The Bernstein-von Mises theorem for Bayesian one-pass online learning
Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in...
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents
Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to automate...
The Challenge of Attention at Long Contexts
Why attention is O(n²) in memory and compute, how the KV cache grows with context length, and how FlashAttention solves the IO bottleneck without changing the algorithm.
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus
Decentralized Autonomous Organizations (DAOs) are inclined explore Small Language Models (SLMs) as edge-native constitutional firewalls to vet proposals...
The Cold Start Problem - When Your Recommender Knows Nothing
How to recommend to new users and new items when collaborative filtering has no interaction history - the cold start problem and its production solutions.
The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling
Scaling Vision-Language-Action (VLA) models by upgrading the vision encoder is expected to improve downstream manipulation performance--as it does in vi...
The Continuity Layer: Why Intelligence Needs an Architecture for What It Carries Forward
The most important architectural problem in AI is not the size of the model but the absence of a layer that carries forward what the model has come to u...
The Data Engineering Landscape for AI Teams
What data engineers actually do in AI organizations, how data flows from raw sources to model serving, and when the data layer becomes the bottleneck for machine learning.
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is kn...
The First Token Knows: Single-Decode Confidence for Hallucination Detection
Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decodin...
The functools Module - lru_cache, partial, reduce, singledispatch and More
Master the entire functools module at engineering depth - LRU cache internals and eviction, wraps, partial and partialmethod, reduce with operator, total_ordering, cached_property, singledispatch and singledispatchmethod, thread safety considerations, and real-world usage patterns.
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models
Foundation models for biology and physics optimize predictive accuracy, but their internal representations systematically fail to preserve the continuou...
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
Reliable deployment of language models requires two capabilities that appear distinct but share a common geometric foundation: predicting whether a mode...
The GIL Explained - What It Is, What It Isn't, and How to Work Around It
Master Python's Global Interpreter Lock at engineering depth - what the GIL protects, why counter += 1 is not atomic, the check interval, I/O vs CPU-bound threading, multiprocessing, C extensions that release the GIL, and Python 3.13 free-threaded mode.
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models
Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether the...
The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
We study the problem of learning in zero-sum matrix games with repeated play and bandit feedback. Specifically, we focus on developing uncoupled algorit...
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Mi...
The Invalsi Benchmarks: measuring the Linguistic and Mathematical understanding of Large Language Models in Italian.
The Invalsi Benchmarks: measuring the Linguistic and... - published at COLING 2025.
The Iterator Protocol - How Python's for Loop Really Works
Master Python's iterator protocol at engineering depth - __iter__, __next__, StopIteration, the iterable vs iterator distinction, for-loop desugaring, iter() with sentinel, next() with default, and lazy pipelines with itertools.
The Last Human-Written Paper: Agent-Native Research Artifacts
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along...
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units.
The LLM Language Network: A Neuroscientific Approach... - published at NAACL 2025.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model sc...
The ML Lifecycle
The complete end-to-end lifecycle of a machine learning model, from problem definition through deployment, monitoring, and eventual retirement - with feedback loops, governance, and retraining triggers.
The ML System Design Framework
A structured 4-step framework for approaching ML system design interviews and real production projects - from requirements to deep dive.
The MLOps Lifecycle
Understand the end-to-end MLOps lifecycle, maturity levels 0–3, the nine components of production ML, and why ML deployment is categorically different from software deployment.
The monotonicity of the Franz-Parisi potential is equivalent with Low-degree MMSE lower bounds
Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. Th...
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
Despite the success of reinforcement learning for large language models, a common failure mode is reduced sampling diversity, where the policy repeatedl...
The Probabilistic Perspective on ML - Learning as Bayesian Inference
How Bayesian inference unifies all of machine learning under one framework: prior beliefs, observed evidence, and posterior distributions over model parameters.
The Python Import System - importlib, Finders, Loaders, and Import Hooks
Master the Python import system at engineering depth - sys.modules cache, import resolution order, finders and loaders, importlib.import_module, relative vs absolute imports, __init__.py, __all__, circular imports, custom import hooks, and importlib.reload.
The ReAct Pattern
Master the ReAct (Reasoning + Acting) pattern - the 2022 breakthrough that grounds LLM reasoning in real observations and prevents hallucination in agents.
The Role of Handling Attributive Nouns in Improving Chinese-To-English Machine Translation.
The Role of Handling Attributive Nouns in Improving... - published at COLING 2025.
The Scaling Properties of Implicit Deductive Reasoning in Transformers
We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating p...
The Second Tiny Papers Track at ICLR 2024, Tiny Papers @ ICLR 2024, Vienna, Austria, May 11, 2024
The Second Tiny Papers Track at ICLR 2024, Tiny Papers @ ICLR 2024, Vienna, Austria, May 11, 2024 - published at Tiny Papers @ ICLR 2024.
The Stability of Online Algorithms in Performative Prediction
The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we s...
The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail
Niche-domain Indic ASR -- digit strings, currency amounts, addresses, brand names, English/Indic codemix -- is under-served by both open-source SOTA and...
Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scal...
Thermodynamic Response Functions in Singular Bayesian Models
Singular statistical models-including mixtures, matrix factorization, and neural networks-violate regular asymptotics due to parameter non-identifiabili...
Theta-regularized Kriging: Modelling and Algorithms
To obtain more accurate model parameters and improve prediction accuracy, we proposed a regularized Kriging model that penalizes the hyperparameter thet...
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning
Humans paint images incrementally: they plan a global layout, sketch a coarse draft, inspect, and refine details, and most importantly, each step is gro...
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
Recent advances in generative video models are increasingly driven by post-training and test-time scaling, both of which critically depend on the qualit...
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series.
Thinking with DistilQwen: A Tale of Four Distilled R... - published at EMNLP 2025.
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement
We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Rel...
Thread Blocks, Warps, and Grids
Master the CUDA thread hierarchy - threads, warps, blocks, and grids - how they map to physical hardware, how to calculate global thread indices for 1D, 2D, and 3D problems, and how to choose block dimensions for maximum SM occupancy.
Three-Phase Transformer
We present Three-Phase Transformer (3PT), a residual-stream structural prior for decoder-only Transformers on a standard SwiGLU + RMSNorm + RoPE + GQA b...
TIDE: Every Layer Knows the Token Beneath the Context
We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and...
TIES Merging - Resolving Sign Conflicts
How TIES-Merging eliminates task interference by trimming small deltas, electing signs by majority vote, and merging only aligned parameters.
Tiling and Shared Memory Optimization
How tiled matrix multiply reduces HBM traffic by reusing data in shared memory, optimal tile size selection, double buffering with cp.async, and applying the tiling pattern to attention and convolution.
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory
Structured memory representations such as knowledge graphs are central to autonomous agents and other long-lived systems. However, most existing approac...
Time Series Forecasting Patterns
Master the core patterns, classical methods, and deep learning approaches for time series forecasting - including the most critical mistake practitioners make with train/test splits.
Time Series Foundation Models as Strong Baselines in Transportation Forecasting: A Large-Scale Benchmark Analysis
Accurate forecasting of transportation dynamics is essential for urban mobility and infrastructure planning. Although recent work has achieved strong pe...
Time-Series Features
Feature engineering for temporal data - lag features, rolling statistics, Fourier seasonality, and preventing temporal leakage that destroys production forecasts.
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result...
TIP: Token Importance in On-Policy Distillation
On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter...
TIPA: Typologically Informed Parameter Aggregation.
TIPA: Typologically Informed Parameter Aggregation. - published at EACL 2026.
Token Cost Monitoring
Monitor and control LLM API costs in production - cost-per-request dashboards, budget alerts, token efficiency optimization, cost attribution by feature and user, and anomaly detection.
Tokenization Deep Dive
How tokenizers convert raw text to token IDs - BPE from scratch, WordPiece, byte-level BPE, and the surprising ways tokenization breaks models.
Tool Use and Function Calling
Master how AI agents call tools - from JSON schema definitions to parallel execution, error handling, and the tool design principles that make agents reliable.
Tool Use and Function Calling
Enabling LLMs to invoke external tools and APIs through structured function calling, covering JSON schema design, Anthropic vs OpenAI formats, parallel tool calls, and production safety.
Tool Use for Coding
Complete coding agent tool set: file operations, bash execution, search, git integration, LSP queries - full implementations with safety and error handling.
TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering
Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation....
torch.compile and XLA
Deep dive into PyTorch's torch.compile architecture - TorchDynamo graph capture, AOTAutograd, TorchInductor code generation, XLA for TPU/GPU, and when compiler-based optimization delivers real ML performance gains.
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training
Recent advances in unified multimodal models (UMMs) have led to a proliferation of architectures capable of understanding, generating, and editing acros...
Toward Autonomous Long-Horizon Engineering for ML Research
Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across ta...
Toward Generative Quantum Utility via Correlation-Complexity Map
We propose a Correlation-Complexity Map as a practical diagnostic tool for determining when real-world data distributions are structurally aligned with...
Toward World Models for Epidemiology
World models have emerged as a unifying paradigm for learning latent dynamics, simulating counterfactual futures, and supporting planning under uncertai...
Towards Autonomous Mechanistic Reasoning in Virtual Cells
Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their appli...
Towards Faithful Multimodal Concept Bottleneck Models
Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied i...
Towards Long-horizon Agentic Multimodal Search
Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managi...
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain co...
TPU Architecture and Use
Deep dive into Google TPU v4/v5 architecture, systolic arrays, XLA compilation, TPU pods, JAX programming model, cost comparison with GPUs, and when TPUs outperform GPU clusters.
TRACE: Capability-Targeted Agentic Training
Large Language Models (LLMs) deployed in agentic environments must exercise multiple capabilities across different task instances, where a capability is...
Traceprop: End-to-End Provenance-Guided Data Attribution for Auditable ML
The first unified system connecting raw source files through preprocessing and training to individual predictions, with gradient-based attribution and provenance-guided machine unlearning. Sub-1% lineage overhead, 266x faster than TRAK on CPU, exceeds retrain-from-scratch unlearning gold standard.
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification
Every call to an LLM classification endpoint produces a labeled input-output pair already retained in production logs. These pairs constitute a free, gr...
Tracing LLM Applications
What tracing means for LLM apps - capturing every prompt, completion, latency, cost, and error in queryable traces. Why traditional APM fails for AI, OpenTelemetry GenAI semantic conventions, and a complete production-grade tracer implementation.
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
Post-training data plays a pivotal role in shaping the capabilities of Large Language Models (LLMs), yet datasets are often treated as isolated artifact...
Training a Student Expert via Semi-Supervised Foundation Model Distillation
Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations....
Training Cost Optimization
Reduce model training costs by 60–80% through spot instances, gradient checkpointing, mixed precision, and compute-optimal training - without sacrificing accuracy.
Training Cost Optimization
Reducing ML training costs systematically - spot instances, mixed precision, gradient checkpointing, compute-optimal training (Chinchilla), and distributed training overhead.
Training Data Preparation for Fine-Tuning
Building high-quality data pipelines for LoRA fine-tuning - chat templates, instruction masking, deduplication, quality filtering, synthetic data generation, and dataset formats that actually produce good models.
Training Dynamics and Debugging
Systematic debugging toolkit for neural network training - loss landscape geometry and flat minima, gradient flow analysis with per-layer norm plots, learning rate finder algorithm, cyclical LR and warmup schedules, gradient clipping strategies, NaN detection hooks, TensorBoard and W&B integration patterns, and a complete pre-training checklist with runnable code.
Training Infrastructure at Scale
Build fault-tolerant GPU training clusters with InfiniBand, NCCL collective operations, Slurm and Kubernetes job scheduling, elastic training, and automatic checkpointing for multi-day training runs.
Training Jobs on Kubernetes
Running ML training on Kubernetes - Jobs, CronJobs, PyTorchJob and TFJob with the Training Operator, fault tolerance, checkpoint-based recovery, spot node handling, and distributed training patterns.
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external su...
Training MoE Models
How to train Mixture of Experts models at scale - expert parallelism, capacity factors, token dropping, load imbalance, training instability, and the GShard approach to 600B parameters.
Trajectory Evaluation
Evaluating the full action sequence, not just the final output - trajectory metrics, automatic scoring, and comparing agent versions.
Transfer Learning for Meta-analysis Under Covariate Shift
Randomized controlled trials often do not represent the populations where decisions are made, and covariate shift across studies can invalidate standard...
Tree-of-Thought Prompting
Explore multiple reasoning paths simultaneously using Tree-of-Thought - the technique that enables LLMs to backtrack, evaluate alternatives, and solve problems that defeat linear chain-of-thought.
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, suc...
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importan...
Triton for Custom Kernels
Write production GPU kernels in Python with OpenAI Triton - learn the tile-based programming model, core primitives, and how to implement softmax, layer norm, GEMM, and custom attention kernels that match CUDA performance.
Triton Inference Server and TorchServe
Production-grade ML serving frameworks - NVIDIA Triton's dynamic batching and multi-backend support, TorchServe's PyTorch-native serving, and when to use each.
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition
Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting model...
Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models
Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual ina...
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models.
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustwort... - published at NAACL 2025.
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet compl...
TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos
We present TT4D, a large-scale, high-fidelity table tennis dataset. It provides 140+ hours of reconstructed singles and doubles gameplay from monocular...
Tunable Soft Equivariance with Guarantees
Equivariance is a fundamental property in computer vision models, yet strict equivariance is rarely satisfied in real-world data, which can limit a mode...
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robust...
Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Environments
This paper identifies a critical yet underexplored challenge in reasoning alignment from multiple multi-modal large language models (MLLMs): In non-stat...
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for...
Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy
YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparen...
Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training
Population-based learning paradigms, including evolutionary strategies, Population-Based Training (PBT), and recent model-merging methods, combine fast...
Two-Tower Models
How dual encoder architectures power billion-scale recommendation and search by separating user and item representations and querying them with approximate nearest neighbor search.
Two-Tower Models - The Architecture Powering Google, TikTok, and YouTube
How two-tower neural networks enable billion-scale retrieval by learning separate user and item towers that can be precomputed for ultra-fast inference.
Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving
The rapid evolution of autonomous, agentic artificial intelligence within financial services has introduced an existential architectural crisis: large l...
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster
AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and ma...
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with rein...
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
MLLM-based GUI agents have demonstrated strong capabilities in complex user interface interaction tasks. However, long-horizon scenarios remain challeng...
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts...
Uncertainty Quantification - Knowing What Your Model Doesn't Know
Calibration, reliability diagrams, Expected Calibration Error, temperature scaling, and the full toolkit for quantifying and correcting uncertainty in production ML models.
Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurat...
Uncertainty Quantification Via the Posterior Predictive Variance
We use the law of total variance to generate multiple expansions for the posterior predictive variance. These expansions are sums of terms involving con...
Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models
Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliar...
Understanding and Enforcing Weight Disentanglement in Task Arithmetic
Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. Th...
Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
The recent success of reinforcement learning (RL) in large reasoning models has inspired the growing adoption of RL for post-training Multimodal Large L...
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher co...
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems t...
Unified Memory and Memory Pooling
How CUDA Unified Memory works under the hood, when it helps versus hurts performance, and how PyTorch's caching allocator and memory pools eliminate allocation overhead in production ML systems.
Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity
Reinforcement Learning with Verifiable Rewards (RLVR) has achieved substantial gains in single-attempt accuracy (Pass@1) on reasoning tasks, yet often s...
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Polic...
UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection
In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent...
UniMesh: Unifying 3D Mesh Understanding and Generation
Recent advances in 3D vision have led to specialized models for either 3D understanding (e.g., shape classification, segmentation, reconstruction) or 3D...
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set...
UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification
As large language models (LLMs) continue to advance rapidly, they are becoming increasingly capable while simultaneously demanding ever-longer context l...
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in a...
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, b...
unittest - The Standard Library Test Framework
Master Python's unittest framework at engineering depth - TestCase lifecycle, all assertion methods, assertRaises as context manager, setUp/tearDown/setUpClass, unittest.mock.patch, TestSuite, skip decorators, and subtests for parametrised testing.
Universal Approximation Theorem
The Universal Approximation Theorem rigorously explained - Cybenko 1989, Hornik 1991, Leshno 1993, depth separation (Telgarsky 2015/2016), Barron's theorem, NTK, Lottery Ticket Hypothesis, double descent, and NumPy demonstrations of approximation quality vs width.
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often tr...
Unlocking Dense Metric Depth Estimation in VLMs
Vision-Language Models (VLMs) excel at 2D tasks such as grounding and captioning, yet remain limited in 3D understanding. A key limitation is their text...
Unstructured Pruning
Weight-level sparsity, the Lottery Ticket Hypothesis, SparseGPT, Wanda, and 2:4 structured sparsity - why unstructured pruning is theoretically elegant but practically limited for LLMs.
Unsupervised Continual Learning for Amortized Bayesian Inference
Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers...
Validation with Pydantic - Production Request and Response Models
Master Pydantic v2 at engineering depth - BaseModel, Field constraints, field and model validators, ORM mode, discriminated unions, partial updates for PATCH endpoints, JSON Schema generation, and the model_dump gotchas that silently corrupt production data.
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and...
Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture -- Bridging Predictive and Generative Self-Supervised Learning
The Joint-Embedding Predictive Architecture (JEPA) is often seen as a non-generative alternative to likelihood-based self-supervised learning, emphasizi...
Variational Autoencoders
Master Variational Autoencoders - ELBO derivation, reparameterization trick, β-VAE disentanglement, VQ-VAE discrete latent spaces, conditional VAE, and PyTorch implementation for MNIST generation and anomaly detection.
Variational Autoencoders - Learning Latent Distributions with Evidence Lower Bound
VAEs combine variational inference with neural networks to learn a probabilistic latent space - enabling generation, interpolation, and disentanglement.
Variational Garrote for Sparse Inverse Problems
Sparse regularization plays a central role in solving inverse problems arising from incomplete or corrupted measurements. Different regularizers corresp...
VaSST: Variational Inference for Symbolic Regression using Soft Symbolic Trees
Symbolic regression has recently gained traction in AI-driven scientific discovery, aiming to recover explicit closed-form expressions from data that re...
VCRMNER: Visual Cue Refinement in Multimodal NER using CLIP Prompts.
VCRMNER: Visual Cue Refinement in Multimodal NER usi... - published at COLING 2025.
VecMol: Vector-Field Representations for 3D Molecule Generation
Generative modeling of three-dimensional (3D) molecules is a fundamental yet challenging problem in drug discovery and materials science. Existing appro...
Vector Databases
Compare Pinecone, Qdrant, Weaviate, Milvus, Chroma, and pgvector - understand the engineering trade-offs and build a production vector store.
Vector Databases Compared - Pinecone, Weaviate, Qdrant, Chroma, pgvector
Systematic comparison of the major vector databases - architecture, managed vs self-hosted, hybrid search, filtering, update performance, consistency, and cost.
Vector Search in Practice
How approximate nearest neighbor search works, how to choose the right vector database, and how to build production-grade retrieval pipelines that stay fast at millions of documents.
Vector Similarity Search Fundamentals
Master cosine similarity, dot product, L2 distance, exact vs approximate search, the curse of dimensionality, and how to evaluate vector search quality with recall@K.
Vectorization with NumPy - Escaping Python's Loop Overhead
Understand why Python loops are slow, how NumPy's C-level loops bypass interpreter overhead, broadcasting rules, views vs copies, memory layout, ufuncs, and real-world data pipeline optimization.
VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects
As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured f...
VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics
Existing online benchmarks for mobile GUI agents remain largely app-centric and task-homogeneous, failing to reflect the diversity and instability of re...
venv and virtualenv - Python Environment Isolation
Master Python virtual environments at full engineering depth - how venv works at the filesystem level, PATH manipulation, pyvenv.cfg, pyenv for Python version management, and why Docker containers are not a substitute for virtual environments.
Vero: An Open RL Recipe for General Visual Reasoning
What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-langua...
Video Generation with Predictive Latents
Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, impr...
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
With the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between infl...
VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
Visual tokenizers map high-dimensional raw pixels into a compressed representation for downstream modeling. Beyond compression, tokenizers dictate what...
ViMU: Benchmarking Video Metaphorical Understanding
Any new medium, once it emerges, is used for more than the transmission of overt content alone. The information it carries typically operates on two lev...
Virtual Memory and Page Faults
Understand virtual memory layout, page tables, TLB, huge pages, and page faults - and how these OS mechanisms directly affect PyTorch training, large model loading, and ML dataset memory mapping.
Vision-Language Models
How modern AI systems combine vision encoders with language models to understand and reason about images.
Vision-Language Models Struggle to Align Entities across Modalities.
Vision-Language Models Struggle to Align Entities ac... - published at ACL 2025.
Vista4D: Video Reshooting with 4D Point Clouds
We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically,...
Visual Reasoning through Tool-supervised Reinforcement Learning
In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Mo...
Visual Search and Product Discovery
Image embedding models for retail visual search, CLIP-based product discovery, FAISS similarity retrieval, multimodal search combining image and text, and the systems behind shop-the-look features.
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due...
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetit...
vLLM and Inference Servers
Learn how production inference servers like vLLM, TGI, TensorRT-LLM, and Ollama combine PagedAttention, continuous batching, and optimized kernels to serve LLMs at scale.
vLLM Architecture and Deployment
Deploy open LLMs at production scale using vLLM - PagedAttention, continuous batching, tensor parallelism, and OpenAI-compatible serving for LLaMA 3 70B and beyond.
VoxMind: An End-to-End Agentic Spoken Dialogue System
Recent end-to-end spoken dialogue models enable natural interaction. However, as user demands become increasingly complex, models that rely solely on co...
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation.
VoxpopuliTTS: a large-scale multilingual TTS corpus... - published at COLING 2025.
Warp Divergence and Control Flow
How branch divergence serializes GPU warp execution, the cost of divergence, warp shuffle intrinsics, and concrete techniques for restructuring kernels to minimize divergence.
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existi...
Watch Before You Answer: Learning from Visually Grounded Post-Training
It is critical for vision-language models (VLMs) to comprehensively understand visual, temporal, and textual cues. However, despite rapid progress in mu...
Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers.
Watching the AI Watchdogs: A Fairness and Robustness... - published at NAACL 2025.
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training
End-to-end spoken dialogue models have garnered significant attention because they offer a higher potential ceiling in expressiveness and perceptual abi...
Web Scraping Agents
Agent-based web scraping - handling dynamic JavaScript rendering, login flows, multi-page pagination, structured data extraction, and anti-detection techniques.
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow...
WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning
While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic mul...
Weight Initialization
Why weight initialization determines whether deep networks train or collapse - symmetry breaking failure, Xavier/Glorot derivation, He/Kaiming for ReLU, LSUV, orthogonal init, bias strategies, and full NumPy experiments measuring gradient flow across 10 layers.
Weights & Biases - The ML Experiment Tracking Standard
How W&B's experiment tracking, hyperparameter sweeps, model registry, and artifact management transform chaotic Jupyter notebooks into reproducible, collaborative ML workflows.
Weights & Biases Deep Dive
W&B for production ML teams - run tracking, sweeps, artifact versioning, collaborative reports, alerts, and how it compares to MLflow.
What are AI Agents?
Understand precisely what an AI agent is - the definition, the 5 key properties, the taxonomy, and why LLMs finally made agents practical.
What Are Embeddings and Why They Matter
The fundamental concept of embeddings - mapping meaning to geometric space, cosine similarity, Word2Vec, the king-queen analogy, and why dense retrieval replaced keyword search.
What do Language Models Learn and When? The Implicit Curriculum Hypothesis
Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining rema...
What Does Flow Matching Bring To TD Learning?
Recent work shows that flow matching can be effective for scalar Q-value function estimation in reinforcement learning (RL), but it remains unclear why...
What is LLMOps
LLMOps defined - the operational discipline for managing LLM-powered applications in production, why it differs from MLOps, and the full lifecycle every AI engineering team must master.
What is MCP?
The Model Context Protocol - announced by Anthropic in November 2024 - solves the N×M integration problem by giving AI systems a standard way to connect to any tool or data source.
What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search
Recent work has demonstrated the promise of orchestrating large language models (LLMs) within evolutionary and agentic optimization systems. However, th...
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning.
What Makes for Good Visual Instructions? Synthesizin... - published at COLING 2025.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing toke...
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack
Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, pos...
When Can LLMs Learn to Reason with Weak Supervision?
Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capab...
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory re...
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a...
When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models
While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilit...
When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
Large language models are increasingly used as agents in social, economic, and policy simulations. A common assumption is that stronger reasoning should...
When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling...
When to Trust Imagination: Adaptive Action Execution for World Action Models
World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and f...
When to Use a Framework
The framework vs raw API decision for agents - what abstractions cost, what they provide, and a decision tree based on your actual requirements.
When to Use Agents
A decision framework for when autonomous agents are appropriate vs. when simpler approaches are better - covering cost of agency, task classification, anti-patterns, and ROI analysis.
When to Use Reasoning Models in Production
A practical decision framework for routing tasks to reasoning models - task taxonomy, cost-benefit analysis, latency trade-offs, and hybrid routing architectures.
When to Use SSMs in Production
A practical deployment guide: use cases where SSMs win, the streaming inference pattern, model availability on HuggingFace, fine-tuning SSMs, and a forward-looking outlook.
When Your Model Stops Working: Anytime-Valid Calibration Monitoring
Practitioners monitoring deployed probabilistic models face a fundamental trap: any fixed-sample test applied repeatedly over an unbounded stream will e...
Where Do LLMs Compose Meaning? A Layerwise Analysis of Compositional Robustness.
Where Do LLMs Compose Meaning? A Layerwise Analysis... - published at EACL 2026.
Who Guards the Guardians? The Challenges of Evaluating Identifiability of Learned Representations
Identifiability in representation learning is commonly evaluated using standard metrics (e.g., MCC, DCI, R^2) on synthetic benchmarks with known ground-...
Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages
A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied hi...
Why AI Evaluation Is Hard
Understanding the fundamental gap between software testing and AI evaluation - non-determinism, no oracle, emergent failures, and how to build a multi-layered evaluation strategy.
Why an LLM Gateway
The case for centralizing all LLM traffic through a single gateway layer - routing, cost control, fallbacks, and observability without rewriting application code.
Why Data Versioning
The case for treating datasets as first-class versioned artifacts - regulatory requirements, reproducibility, drift detection, and the approaches to versioning (full copy, delta, pointer).
Why Experiment Tracking
The business and technical case for tracking every ML experiment - what to track, why it matters, and what happens when you don't.
Why Graphs for ML
When tabular data fails - graph formalism, adjacency matrix, Laplacian, graph types, real-world datasets, the Weisfeiler-Lehman test, and why CNNs cannot handle graph-structured data.
Why Human-in-the-Loop Matters
Understand why full automation fails, where the alignment gap lives, what regulations demand, and how to design the right level of human oversight for any AI system.
Why Model Compression Matters
The memory wall, inference costs, edge deployment, and latency requirements that make model compression essential for production AI systems - with real cost math, a full compression taxonomy, and decision frameworks for choosing the right technique.
Why Model Merging Exists
The catastrophic forgetting problem, why naive ensembles are too expensive, and the surprising geometric insight that makes model merging possible.
Why Multi-Agent Systems?
The fundamental case for multi-agent: parallelization, specialization, and verification - and the honest cost of coordination overhead.
Why RAG and When Not To
Understand why LLMs hallucinate, what RAG actually solves, and the decision framework for choosing RAG vs fine-tuning vs prompt stuffing.
Why RAG Exists
Understand why Retrieval-Augmented Generation was invented, what problems it solves that fine-tuning and prompt stuffing cannot, and how to architect a minimal RAG pipeline from scratch.
Why Structured Output Matters in Production
The taxonomy of LLM output failures, why prompt-based JSON extraction breaks at scale, the production impact of 5% failure rates, and the spectrum of solutions from prompt engineering to constrained decoding.
Why Synthetic Data
Understand why synthetic data has become central to AI engineering - the labeled data bottleneck, privacy constraints, rare events, LLMs as generators, landmark case studies, and when synthetic beats real.
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) harnesses. However,...
WildDet3D: Scaling Promptable 3D Detection in the Wild
Understanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection--...
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single...
Working with 128K+ Context Windows in Production
A complete production engineering guide for building applications with long-context LLMs - model selection, cost management, prompt structure, multi-turn conversation, and memory-augmented systems.
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
Workspace learning requires AI agents to identify, reason over, exploit, and update explicit and implicit dependencies among heterogeneous files in a wo...
World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings
Recent work interprets the linear recoverability of geographic and temporal variables from large language model (LLM) hidden states as evidence for worl...
World2Minecraft: Occupancy-Driven Simulated Scenes Construction
Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from...
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes
Recent 3D world modeling systems based on generative scene synthesis, such as Marble, can create coherent and explorable 3D environments, yet their outp...
WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchm...
Writing Your First CUDA Kernel
End-to-end walkthrough of writing a production-grade fused bias+GELU CUDA kernel, including kernel fusion principles, launch configuration, error checking, Triton alternative, and full benchmarks.
X2SAM: Any Segmentation in Images and Videos
Multimodal Large Language Models (MLLMs) have demonstrated strong image-level visual understanding and reasoning, yet their pixel-level perception acros...
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers
Model poisoning attacks pose a significant security threat to Federated Learning (FL). Most existing model poisoning attacks rely on collusion, requirin...
XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity
Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover, they...
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass
We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward mod...
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such a...
ZenML
Building portable, stack-agnostic MLOps pipelines with ZenML - stacks, steps, materializers, and seamless local-to-cloud migration with MLflow and Vertex AI.
ZeRO and Memory Efficiency
DeepSpeed ZeRO stages 1/2/3 - sharding optimizer states, gradients, and parameters across data parallel workers to enable training models too large for single-GPU memory.
Zero-Copy and Data Transfer
How to eliminate unnecessary memory copies in ML data pipelines - from sendfile() and mmap() to NumPy views, PyTorch pinned memory, and Apache Arrow Flight for zero-copy data serving.
Zero-Shot Prompting
Learn how to elicit reliable behavior from LLMs using only instructions - no examples required - by mastering prompt anatomy, role personas, and format control.
Zero-shot World Models Are Developmentally Efficient Learners
Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other...
Zeroth-Order Stackelberg Control in Combinatorial Congestion Games
We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish us...
ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost t...
ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training
Deep learning models, despite their impressive achievements, suffer from high computational costs and memory requirements, limiting their usability in r...