"Taking Stock at FAccT": Using Participatory Design to Co-Create a Vision for the Fairness, Accountability and Transparency Community
As a relatively new forum, ACM FAccT has become a key space for activists and scholars to critically examine emerging AI and ML technologies. It brings...
$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge
Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specif...
3DTCR: A Physics-Based Generative Framework for Vortex-Following 3D Reconstruction to Improve Tropical Cyclone Intensity Forecasting
Tropical cyclone (TC) intensity forecasting remains challenging as current numerical and AI-based weather models fail to satisfactorily represent extrem...
A 1/R Law for Kurtosis Contrast in Balanced Mixtures
Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We prove a sharp redundancy law: for a standardized projection w...
A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications
Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery....
A Dataset is Worth 1 MB
A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate o...
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms....
A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions
Dirac-Frenkel instantaneous residual minimization evolves nonlinear parametrizations of PDE solutions in time, but ill-conditioning can render the param...
A distributed semismooth Newton based augmented Lagrangian method for distributed optimization
This paper proposes a novel distributed semismooth Newton based augmented Lagrangian method for solving a class of optimization problems over networks,...
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-sessio...
A Minimal Agent for Automated Theorem Proving
We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements...
A Mixed Diet Makes DINO An Omnivorous Vision Encoder
Pre-trained vision encoders like DINOv2 have demonstrated exceptional performance on unimodal tasks. However, we observe that their feature representati...
A multimodal slice discovery framework for systematic failure detection and explanation in medical image classification
Despite advances in machine learning-based medical image classifiers, the safety and reliability of these systems remain major concerns in practical set...
A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound
In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the $\ln \ln T$ factor in the parameter-free learning with expert bound. I...
A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based Matching
Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying...
A novel hybrid approach for positive-valued DAG learning
Causal discovery from observational data remains a fundamental challenge in machine learning and statistics, particularly when variables represent inher...
A Predictive View on Streaming Hidden Markov Models
We develop a predictive-first optimisation framework for streaming hidden Markov models. Unlike classical approaches that prioritise full posterior reco...
A Proper Scoring Rule for Virtual Staining
Generative virtual staining (VS) models for high-throughput screening (HTS) can provide an estimated posterior distribution of possible biological featu...
A Quantitative Characterization of Forgetting in Post-Training
Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develo...
A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention
Machine-learning interatomic potentials (MLIPs) have advanced rapidly, with many top models relying on strong physics-based inductive biases. However, a...
A Reference Architecture of Reinforcement Learning Frameworks
The surge in reinforcement learning (RL) applications gave rise to diverse supporting technology, such as RL frameworks. However, the architectural patt...
A Stein Identity for q-Gaussians with Bounded Support
Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving...
A Systematic Security Evaluation of OpenClaw and Its Variants
Tool-augmented AI agents substantially extend the practical capabilities of large language models, but they also introduce security risks that cannot be...
A theory of learning data statistics in diffusion models, from easy to hard
While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue fi...
A Tsetlin Machine-driven Intrusion Detection System for Next-Generation IoMT Security
The rapid adoption of the Internet of Medical Things (IoMT) is transforming healthcare by enabling seamless connectivity among medical devices, systems,...
A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection
Academic integrity continues to face the persistent challenge of examination cheating. Traditional invigilation relies on human observation, which is in...
A two-step sequential approach for hyperparameter selection in finite context models
Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the cont...
A unified perspective on fine-tuning and sampling with diffusion and flow models
We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base de...
A Variational Estimator for $L_p$ Calibration Errors
Calibration - the problem of ensuring that predicted probabilities align with observed class frequencies - is a basic desideratum for reliable ML prediction.
Abductive Reasoning with Syllogistic Forms in Large Language Models
Research in AI using Large-Language Models (LLMs) is rapidly evolving, and the comparison of their performance with human reasoning has become a key con...
Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification
Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial...
Accurate and Reliable Uncertainty Estimates for Deterministic Predictions Extensions to Under and Overpredictions
Computational models support high-stakes decisions across engineering and science, and practitioners increasingly seek probabilistic predictions to quan...
Active Bipartite Ranking with Smooth Posterior Distributions
In this article, bipartite ranking, a statistical learning problem involved in many applications and widely studied in the passive context, is approache...
AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning
A novel regularization technique, AdaCubic, is proposed that adapts the weight of the cubic term. The heart of AdaCubic is an auxiliary optimization pro...
Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference
In this paper, we provide the first investigation into adaptive combinatorial experimental design, focusing on the trade-off between regret minimization...
Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty
Minimising a spectral risk objective, defined as a convex combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the unc...
Adaptive Greedy Frame Selection for Long Video Understanding
Large vision--language models (VLMs) are increasingly applied to long-video question answering, yet inference is often bottlenecked by the number of inp...
Adaptive multi-fidelity optimization with fast learning rates
In multi-fidelity optimization, biased approximations of varying costs of the target function are available. This paper studies the problem of optimizin...
Adaptive Querying with AI Persona Priors
We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within t...
Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
Transformer attention is typically implemented using softmax normalization, which enforces attention weights with unit sum normalization. While effectiv...
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual parti...
Agnostic learning in (almost) optimal time via Gaussian surface area
The complexity of learning a concept class under Gaussian marginals in the difficult agnostic model is closely related to its $L_1$-approximability by l...
AI Agents Can Already Autonomously Perform Experimental High Energy Physics
Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with min...
AI-Assisted Unit Test Writing and Test-Driven Code Refactoring: A Case Study
Many software systems originate as prototypes or minimum viable products (MVPs), developed with an emphasis on delivery speed and responsiveness to chan...
AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection
As forgery types continue to emerge consistently, Incremental Face Forgery Detection (IFFD) has become a crucial paradigm. However, existing methods typ...
Amortized Optimal Transport from Sliced Potentials
We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich...
An adaptive wavelet-based PINN for problems with localized high-magnitude source
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer fro...
An Agentic Multi-Agent Architecture for Cybersecurity Risk Management
Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes w...
An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks
Federated learning (FL) is an effective paradigm for distributed environments such as the Internet of Things (IoT), where data from diverse devices with...
An Empirical Study of SFT-DPO Interaction and Parameterization in Small Language Models
Direct Preference Optimization (DPO) is widely used after supervised fine-tuning (SFT) to align language models, yet empirical behavior under small back...
An Independent Safety Evaluation of Kimi K2.5
Kimi K2.5 is an open-weight LLM that rivals closed models across coding, multimodal, and agentic benchmarks, but was released without an accompanying sa...
An Open-Source, Open Data Approach to Activity Classification from Triaxial Accelerometry in an Ambulatory Setting
The accelerometer has become an almost ubiquitous device, providing enormous opportunities in healthcare monitoring beyond step counting or other averag...
ANTIC: Adaptive Neural Temporal In-situ Compressor
The persistent storage requirements for high-resolution, spatiotemporally evolving fields governed by large-scale and high-dimensional partial different...
ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models
Argumentative LLMs (ArgLLMs) are an existing approach leveraging Large Language Models (LLMs) and computational argumentation for decision-making, with...
ARGUS: Seeing the Influence of Narrative Features on Persuasion in Argumentative Texts
Can narratives make arguments more persuasive? And to this end, which narrative features matter most? Although stories are often seen as powerful tools...
Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education
Orofacial clefts are among the most common congenital craniofacial abnormalities, yet accurate prenatal detection remains challenging due to the scarcit...
ASMR-Bench: Auditing for Sabotage in ML Research
As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results wh...
Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended...
Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE
The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain po...
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
Large language models (LLMs) increasingly rely on chain-of-thought (CoT) reasoning to solve complex tasks. Yet ensuring that the reasoning trace both co...
Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning
Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominen...
Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM
This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks usi...
BAGEL: Benchmarking Animal Knowledge Expertise in Language Models
Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models...
Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study
Deep learning in cardiac MRI (CMR) is fundamentally constrained by both data scarcity and privacy regulations. This study systematically benchmarks thre...
Batch Normalization for Neural Networks on Complex Domains
Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of pri...
Batched Kernelized Bandits: Refinements and Extensions
In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a b...
Bayesian X-Learner: Calibrated Posterior Inference for Heterogeneous Treatment Effects under Heavy-Tailed Outcomes
Conditional Average Treatment Effect (CATE) estimation in practice demands three properties simultaneously: heterogeneous effects $τ(x)$, calibrated unc...
Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior
Brain-wide recordings of large-scale networks of neurons now provide an unprecedented view into how the brain drives behavior. However, brain activity c...
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
Accurate evaluation is central to the large language model (LLM) ecosystem, guiding model selection and downstream adoption across diverse use cases. In...
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our wo...
BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understan...
Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed a...
Beyond Distribution Sharpening: The Importance of Task Rewards
Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their trainin...
Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation
We introduce **CRYSTAL** (*__C__lear __R__easoning via __Y__ielded __S__teps, __T__raceability and __L__ogic*), a diagnostic benchmark with 6,372 instan...
Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces
Modern visual world modeling systems increasingly rely on high-capacity architectures and large-scale data to produce plausible motion, yet they often f...
Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation re...
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian...
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods th...
Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations
Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities,...
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation
Large language models (LLMs) encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context...
Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have...
BLISSNet: Deep Operator Learning for Fast and Accurate Flow Reconstruction from Sparse Sensor Measurements
Reconstructing fluid flows from sparse sensor measurements is a fundamental challenge in science and engineering. Widely separated measurements and comp...
Boosting deep Reinforcement Learning using pretraining with Logical Options
Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed...
BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning
Active learning (AL) aims to reduce annotation costs while maximizing model performance by iteratively selecting valuable instances. While foundation mo...
Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors
Yield Multi-Corner Analysis validates circuits across 25+ Process-Voltage-Temperature corners, resulting in a combinatorial simulation cost of $O(K im...
Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evalu...
Can Coding Agents Reproduce Findings in Computational Materials Science?
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benc...
Can LLMs Understand the Impact of Trauma? Costs and Benefits of LLMs Coding the Interviews of Firearm Violence Survivors
Firearm violence is a pressing public health issue, yet research into survivors' lived experiences remains underfunded and difficult to scale. Qualitati...
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provid...
Causal Cellular Context Transfer Learning (C3TL): An Efficient Architecture for Prediction of Unseen Perturbation Effects
Predicting the effects of chemical and genetic perturbations on quantitative cell states is a central challenge in computational biology, molecular medi...
Causal Interpretation of Neural Network Computations with Contribution Decomposition
Understanding how neural networks transform inputs into outputs is crucial for interpreting and manipulating their behavior. Most existing approaches an...
Causality Elicitation from Large Language Models
Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal re...
Certified and accurate computation of function space norms of deep neural networks
Neural network methods for PDEs require reliable error control in function space norms. However, trained neural networks can typically only be probed at...
Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
Conventional fine-tuning on domain-specific datasets can inadvertently alter a model's pretrained multimodal priors, leading to reduced generalization....
Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models
Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language...
Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization
We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussi...
Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Be...
ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis
Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic str...
Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text
We propose Chunk-wise Attention Transducer (CHAT), a novel extension to RNN-T models that processes audio in fixed-size chunks while employing cross-att...
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks f...
Clean Architecture - Dependencies Point Inward
Implement Uncle Bob's Clean Architecture in Python with proper layering, the dependency rule, domain models, service layers, repositories, and framework boundaries.
CLoPA: Continual Low Parameter Adaptation of Interactive Segmentation for Medical Image Annotation
Interactive segmentation enables clinicians to guide annotation, but existing zero-shot models like nnInteractive fail to consistently reach expert-leve...
Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction and Dimensionality Reduction Techniques
The dynamics of Saturn's satellite system offer a rich framework for studying orbital stability and resonance interactions. Traditional methods for anal...
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a funda...
Collective Kernel EFT for Pre-activation ResNets
In finite-width deep neural networks, the empirical kernel $G$ evolves stochastically across layers. We develop a collective kernel effective field theo...
CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning
Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, acti...
Comparing Classical and Quantum Variational Classifiers on the XOR Problem
Quantum machine learning applies principles such as superposition and entanglement to data processing and optimization. Variational quantum models opera...
Competition-Aware CPC Forecasting with Near-Market Coverage
Cost-per-click (CPC) in paid search is a volatile auction outcome generated by a competitive landscape that is only partially observable from any single...
Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern...
Computing Equilibrium beyond Unilateral Deviation
Most familiar equilibrium concepts, such as Nash and correlated equilibrium, guarantee only that no single player can improve their utility by deviating...
Conditioning Protein Generation via Hopfield Pattern Multiplicity
Protein sequence generation via stochastic attention produces plausible family members from small alignments without training, but treats all stored seq...
Configuration Management - Environment-Driven Apps
Externalize and validate application configuration with python-dotenv, pydantic-settings, secrets management, multi-environment configs, and the 12-factor config principle.
Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity
Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at unde...
Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet
The Tactile Internet demands sub-millisecond latency and ultra-high reliability, as high latency or packet loss could lead to haptic control instability...
Controllable Reasoning Models Are Private Thinkers
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result...
Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding
Agentic AI is increasingly judged not by fluent output alone but by whether it can act, remember, and verify under partial observability, delay, and str...
Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline
Identifying the full landscape of small and medium-sized enterprises (SMEs) in specialized industry sectors is critical for supply-chain resilience, yet...
Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore...
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong p...
CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays
Chest X-ray plays a central role in thoracic diagnosis, and its interpretation inherently requires multi-step, evidence-grounded reasoning. However, lar...
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benc...
Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving
Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving...
Data Lake vs Warehouse vs Lakehouse for AI Workloads
What each storage architecture does for AI systems, when ML teams need both raw unstructured data and structured query access on the same platform, and how to choose and implement the right architecture in production AI data pipelines.
daVinci-Env: Open SWE Environment Synthesis at Scale
Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for...
Decentralized Proximal Stochastic Gradient Langevin Dynamics
We propose Decentralized Proximal Stochastic Gradient Langevin Dynamics (DE-PSGLD), a decentralized Markov chain Monte Carlo (MCMC) algorithm for sampli...
Decentralized Ranking Aggregation: Gossip Algorithms for Borda and Copeland Consensus
The concept of ranking aggregation plays a central role in preference analysis, and numerous algorithms for calculating median rankings, often originati...
Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing
In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact real...
Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects
Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context...
Deep ensemble graph neural networks for probabilistic cosmic-ray direction and energy reconstruction in autonomous radio arrays
Using advanced machine learning techniques, we developed a method for reconstructing precisely the arrival direction and energy of ultra-high-energy cos...
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
Transformer models are widely deployed in critical AI applications, yet faults in their attention mechanisms, projections, and other internal components...
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by...
Dependency Injection - Decoupling Components
Master dependency injection in Python from manual constructor injection to DI containers and FastAPI Depends, with testing strategies and architectural trade-offs.
Design Experiments to Compare Multi-armed Bandit Algorithms
Online platforms routinely compare multi-armed bandit algorithms, such as UCB and Thompson Sampling, to select the best-performing policy. Unlike standa...
Design-OS: A Specification-Driven Framework for Engineering System Design with a Control-Systems Design Case
Engineering system design -- whether mechatronic, control, or embedded -- often proceeds in an ad hoc manner, with requirements left implicit and tracea...
Detecting and Suppressing Reward Hacking with Gradient Fingerprints
Reinforcement learning with verifiable rewards (RLVR) typically optimizes for outcome rewards without imposing constraints on intermediate reasoning. Th...
Developing and evaluating a chatbot to support maternal health care
The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource se...
Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study
This study presents the development of the PsyCogMetrics AI Lab (psycogmetrics.ai), an integrated, cloud-based platform that operationalizes psychometri...
Differentiable Zero-One Loss via Hypersimplex Projections
Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enablin...
Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media
The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpful...
Dissecting Quantization Error: A Concentration-Alignment Perspective
Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop. Recently, function-pres...
Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement
Vision-language models encode continuous geometry that their text pathway fails to express: a 6,000-parameter linear probe extracts hand joint angles at...
Do LLMs Benefit From Their Own Words?
Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we rev...
Do Sparse Autoencoders Capture Concept Manifolds?
Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption th...
Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts
Automated annotation of pedagogical dialogue is a high-stakes task where LLMs often fail without sufficient domain grounding. We present a domain-adapte...
DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation
Graph domain adaptation (GDA) aims to transfer knowledge from a labeled source graph to an unlabeled target graph under distribution shifts. However, ex...
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-st...
E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning
While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), existing training paradigms face signific...
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure
Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint e...
EB-RANSAC: Random Sample Consensus based on Energy-Based Model
Random sample consensus (RANSAC), which is based on a repetitive sampling from a given dataset, is one of the most popular robust estimation methods. In...
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision-...
Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification
Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, hi...
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
Multivector retrieval models achieve state-of-the-art effectiveness through fine-grained token-level representations, but their deployment incurs substa...
Efficient Refusal Ablation in LLM through Optimal Transport
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent activation-ba...
Empowering Heterogeneous Graph Foundation Models via Decoupled Relation Alignment
While Graph Foundation Models (GFMs) have achieved remarkable success in homogeneous graphs, extending them to multi-domain heterogeneous graphs (MDHGs)...
Enhancing AI and Dynamical Subseasonal Forecasts with Probabilistic Bias Correction
Decision-makers rely on weather forecasts to plant crops, manage wildfires, allocate water and energy, and prepare for weather extremes. Today, such for...
Enhancing Authorship Attribution with Synthetic Paintings
Attributing authorship to paintings is a historically complex task, and one of its main challenges is the limited availability of real artworks for trai...
Enhancing Hyperspace Analogue to Language (HAL) Representations via Attention-Based Pooling for Text Classification
The Hyperspace Analogue to Language (HAL) model relies on global word co-occurrence matrices to construct distributional semantic representations. While...
Enhancing Robustness of Federated Learning via Server Learning
This paper explores the use of server learning for enhancing the robustness of federated learning against malicious attacks even when clients' training...
Envisioning the Future, One Step at a Time
Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains,...
ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
As corporate responsibility increasingly incorporates environmental, social, and governance (ESG) criteria, ESG reporting is becoming a legal requiremen...
Evaluating Stochasticity in Deep Research Agents
Deep Research Agents (DRAs) are promising agentic systems that gather and synthesize information to support research across domains such as financial de...
Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design
Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse s...
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resourc...
Event-Driven Temporal Graph Networks for Asynchronous Multi-Agent Cyber Defense in NetForge_RL
The transition of Multi-Agent Reinforcement Learning (MARL) policies from simulated cyber wargames to operational Security Operations Centers (SOCs) is...
Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models
Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inputs...
Explainable cluster analysis: a bagging approach
A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of si...
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time Series Foundation Models (TSFMs) have recently emerged as general-purpose forecasting models and show considerable potential for applications in en...
Exploiting Subgradient Sparsity in Max-Plus Neural Networks
Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In...
Exploration Hacking: Can LLMs Learn to Resist RL Training?
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment....
Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables
Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender...
FaultXformer: A Transformer-Encoder Based Fault Classification and Location Identification model in PMU-Integrated Active Electrical Distribution System
Accurate fault detection and localization in electrical distribution systems is crucial, especially with the increasing integration of distributed energ...
Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations....
Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
Reinforcement learning (RL) has become a standard technique for post-training diffusion-based image synthesis models, as it enables learning from reward...
Fixed-Budget Constrained Best Arm Identification in Grouped Bandits
We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic...
FL-MHSM: Spatially-adaptive Fusion and Ensemble Learning for Flood-Landslide Multi-Hazard Susceptibility Mapping at Regional Scale
Existing multi-hazard susceptibility mapping (MHSM) studies often rely on spatially uniform models, treat hazards independently, and provide limited rep...
FlashOptim: Optimizers for Memory Efficient Training
Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just th...
FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems
We present FlexiTac, a low-cost, open-source, and scalable piezoresistive tactile sensing solution designed for robotic end-effectors. FlexiTac is a pra...
Flow Matching is Adaptive to Manifold Structures
Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-depend...
Fly360: Omnidirectional Obstacle Avoidance within Drone View
Obstacle avoidance in unmanned aerial vehicles (UAVs), as a fundamental capability, has gained increasing attention with the growing focus on spatial in...
Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems
What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitio...
From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text
The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising soluti...
From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven Computational Research
While large language models (LLMs) have transformed AI agents into proficient executors of computational materials science, performing a hundred simulat...
From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are u...
From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on s...
General Bayesian Policy Learning
This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an a...
Generalization and Scaling Laws for Mixture-of-Experts Transformers
We develop a theory of generalization and scaling for Mixture-of-Experts (MoE) Transformers that cleanly separates \emph{active} per-input capacity from...
Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data
Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often pr...
Generalized Rapid Action Value Estimation in Memory-Constrained Environments
Generalized Rapid Action Value Estimation (GRAVE) has been shown to be a strong variant within the Monte-Carlo Tree Search (MCTS) family of algorithms f...
Generating DDPM-based Samples from Tilted Distributions
Given $n$ independent samples from a $d$-dimensional probability distribution, our aim is to generate diffusion-based samples from a distribution obtain...
Generating Statistical Charts with Validation-Driven LLM Workflows
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are...
GeoChemAD: Benchmarking Unsupervised Geochemical Anomaly Detection for Mineral Exploration
Geochemical anomaly detection plays a critical role in mineral exploration as deviations from regional geochemical baselines may indicate mineralization...
GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair
Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS sys...
Geometric regularization of autoencoders via observed stochastic dynamics
Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional am...
Geometry-Guided Camera Motion Understanding in VideoLLMs
Camera motion is a fundamental geometric signal that shapes visual perception and cinematic style, yet current video-capable vision-language models (Vid...
Global Interpretability via Automated Preprocessing: A Framework Inspired by Psychiatric Questionnaires
Psychiatric questionnaires are highly context sensitive and often only weakly predict subsequent symptom severity, which makes the prognostic relationsh...
Global Optimality for Constrained Exploration via Penalty Regularization
Efficient exploration is a central problem in reinforcement learning and is often formalized as maximizing the entropy of the state-action occupancy mea...
GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression
Current network data telemetry pipelines consist of massive streams of fine-grained Key Performance Indicators (KPIs) from multiple distributed sources...
Gradient Boosting within a Single Attention Layer
Transformer attention computes a single softmax-weighted average over values -- a one-pass estimate that cannot correct its own errors. We introduce \em...
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In thi...
Gradient Regularized Newton Boosting Trees with Global Convergence
Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based...
Graph-Informed Adversarial Modeling: Infimal Subadditivity of Interpolative Divergences
We study adversarial learning when the target distribution factorizes according to a known Bayesian network. For interpolative divergences, including $(...
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyse...
Hexagonal Architecture (Ports and Adapters)
Implement Hexagonal Architecture in Python using Protocol-based ports, swappable adapters, and clear boundaries between application logic and external systems.
Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations
Hierarchical time-series forecasting is essential for demand prediction across various industries. While machine learning models have obtained significa...
Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs
Inverse problems are the task of calibrating models to match data. They play a pivotal role in diverse engineering applications by allowing practitioner...
Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis
The Hierarchical Kernel Transformer (HKT) is a multi-scale attention mechanism that processes sequences at L resolution levels via trainable causal down...
Hierarchical Planning with Latent World Models
Model predictive control (MPC) with learned world models has emerged as a promising paradigm for embodied control, particularly for its ability to gener...
Histopathology Image Normalization via Latent Manifold Compaction
Batch effects arising from technical variations in histopathology staining protocols, scanners, and acquisition pipelines pose a persistent challenge fo...
HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closure...
Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport
We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the...
HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging
Purpose: Proton magnetic resonance spectroscopic imaging ($^1$H MRSI) enables the mapping of whole-brain metabolites concentrations in-vivo. However, a...
Identifying Causal Effects Using a Single Proxy Variable
Unobserved confounding is a key challenge when estimating causal effects from a treatment on an outcome in scientific applications. In this work, we ass...
Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon...
Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning
The use of ML in cybersecurity has long been impaired by generalization issues: Models that work well in controlled scenarios fail to maintain performan...
InCoder-32B-Thinking: Industrial Code World Model for Thinking
Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason ab...
Inferential Mechanics Part 1: Causal Mechanistic Theories of Machine Learning in Chemical Biology with Implications
Machine learning techniques are now routinely encountered in research laboratories across the globe. Impressive progress has been made through ML and AI...
Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics
Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fun...
Information Router for Mitigating Modality Dominance in Vision-Language Models
Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, whe...
Information-geometric adaptive sampling for graph diffusion
Standard diffusion models for graph generation typically rely on uniform time-stepping, an approach that overlooks the non-homogeneous dynamics of distr...
InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Reducing the hardware footprint of large language models (LLMs) during decoding is critical for efficient long-sequence generation. A key bottleneck is...
InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization
We present a training-free approach for controllable 3D inpainting based on initial noise optimization. In the structured 3D latent diffusion framework,...
Integrated electro-optic attention nonlinearities for transformers
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision...
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of meth...
Invariance-Based Dynamic Regret Minimization
We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms...
Invariant Transformation and Resampling based Epistemic-Uncertainty Reduction
An artificial intelligence (AI) model can be viewed as a function that maps inputs to outputs in high-dimensional spaces. Once designed and well trained...
Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's r...
Inversion-Free Natural Gradient Descent on Riemannian Manifolds
The natural gradient method is widely used in statistical optimization, but its standard formulation assumes a Euclidean parameter space. This paper pro...
Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation
Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of translation erro...
Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder
Training Transformer language models is expensive, as performance typically improves with increasing dataset size and computational budget. Although sca...
Iterative Identification Closure: Amplifying Causal Identifiability in Linear SEMs
The Half-Trek Criterion (HTC) is the primary graphical tool for determining generic identifiability of causal effect coefficients in linear structural e...
Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization
We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for lear...
JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models
Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-...
Kernel Integrated $R^2$: A Measure of Dependence
We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced...
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three a...
Kolmogorov-Arnold causal generative models
Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data....
L2GTX: From Local to Global Time Series Explanations
Deep learning models achieve high accuracy in time series classification, yet understanding their class-level decision behaviour remains challenging. Ex...
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely by...
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where indivi...
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and sub...
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessa...
Learning Dynamic Belief Graphs for Theory-of-mind Reasoning
Theory of Mind (ToM) reasoning with Large Language Models (LLMs) requires inferring how people's implicit, evolving beliefs shape what they seek and how...
Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints
The Flexible Job Shop Scheduling Problem (FJSP) originates from real production lines, while some practical constraints are often ignored or idealized i...
Learning from Child-Directed Speech in Two-Language Scenarios: A French-English Case Study
Research on developmentally plausible language models has largely focused on English, leaving open questions about multilingual settings. We present a s...
Learning interacting particle systems from unlabeled data
Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled...
Learning Rate Transfer in Normalized Transformers
The Normalized Transformer, or nGPT (arXiv:2410.01131) achieves impressive training speedups and does not require weight decay or learning rate warmup....
Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries
This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the De...
Learning the Signature of Memorization in Autoregressive Language Models
All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibrati...
Learning to Reason with Insight for Informal Theorem Proving
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language...
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on st...
Linear Models, Variable Selection, Artificial Intelligence
Variable selection in linear regression models has been a problem since hypothesis testing began. Which variables to include or exclude from a model is...
Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction
The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yi...
Lipschitz bounds for integral kernels
Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Li...
LiveSense: A Real-Time Wi-Fi Sensing Platform for Range-Doppler on COTS Laptop
We present LiveSense - a cross-platform that transforms a commercial off-the-shelf (COTS) Wi-Fi Network Interface Card (NIC) on a laptop into a centimet...
LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis
Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging....
LLM Constitutional Multi-Agent Governance
Large Language Models (LLMs) can generate persuasive influence strategies that shift cooperative behavior in multi-agent populations, but a critical que...
LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable hu...
LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families
Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech rec...
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify unc...
Low-degree Lower bounds for clustering in moderate dimension
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, w...
Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular v...
Low-Resource Guidance for Controllable Latent Audio Diffusion
Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time c...
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained contr...
M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data
Causal graph discovery for space-time systems is challenging in high-dimensional gridded data, which often has many more grid cells than temporal observ...
Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment
Although Automatic Speech Recognition (ASR) in Bengali has seen significant progress, processing long-duration audio and performing robust speaker diari...
Make Your LVLM KV Cache More Lightweight
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency...
ManifoldGD: Training-Free Hierarchical Manifold Guidance for Diffusion-Based Dataset Distillation
In recent times, large datasets hinder efficient model training while also containing redundant concepts. Dataset distillation aims to synthesize compac...
Many-Tier Instruction Hierarchy in LLM Agents
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels...
Mapping the Methodological Space of Classroom Interaction Research: Scale, Duration, and Modality in an Age of AI
Research on classroom interaction has long been divided between large-scale observation and in-depth ethnographic work. We propose a framework mapping t...
Mapping the Phase Diagram of the Vicsek Model with Machine Learning
In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional paramet...
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This oc...
MeanFlow Meets Control: Scaling Sampled-Data Control for Swarms
Steering large-scale swarms in only a few control updates is challenging because real systems operate in sampled-data form: control inputs are updated i...
Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation
Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek-R1 acknowledges hints 39% of the time), implying tha...
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool o...
Memory Caching: RNNs with Growing Memory
Transformers have been established as the de-facto backbones for most recent advances in sequence modeling, mainly due to their growing memory capacity...
Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-ba...
Microservices vs Monolith - Making the Right Choice
Navigate the monolith-to-microservices spectrum with Python - bounded contexts, communication patterns, the modular monolith, and practical decision frameworks.
Mind the Gap: Structure-Aware Consistency in Preference Learning
Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Opt...
Minimax Generalized Cross-Entropy
Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer...
MinShap: A Modified Shapley Value Approach for Feature Selection
Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in t...
MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection
Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting sign...
Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We show this is not a failure of encod...
Mode Seeking meets Mean Seeking for Fast Long Video Generation
Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form d...
Model Agreement via Anchoring
Numerous lines of aim to control $ extit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adop...
Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model
In this paper, we study the problem of learning multi-dimensional Gaussian Mixture Models (GMMs), with a specific focus on model order selection and eff...
MoDora: Tree-Based Semi-Structured Document Analysis System
Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irre...
Modular Plugin System
Build an extensible CLI tool with plugin discovery, loading, and lifecycle management.
Module 01 - Design Patterns in Python Overview
GoF patterns, SOLID principles, DDD, and Hexagonal Architecture - enterprise design patterns implemented idiomatically in Python.
Module 01 - Object-Oriented Programming Overview
Master Python's object model at engineering depth - classes, instances, dunder methods, encapsulation, inheritance, MRO, composition, abstract base classes, dataclasses, SOLID principles, and production design patterns.
Module 02 - Microservices with Python Overview
FastAPI in depth, gRPC, event-driven architecture, service mesh patterns, and API contracts - building production Python microservices.
Module 05: Architecture & Systems Design - Complete Overview
Design production Python systems with clean architecture, hexagonal architecture, dependency injection, plugin systems, 12-factor methodology, and configuration management. The engineering patterns that separate scripts from systems.
Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data
Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the...
MOO: A Multi-view Oriented Observations Dataset for Viewpoint Analysis in Cattle Re-Identification
Animal re-identification (ReID) faces critical challenges due to viewpoint variations, particularly in Aerial-Ground (AG-ReID) settings where models mus...
MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction
With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, pe...
MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective...
Multimodal Optimal Transport for Unsupervised Temporal Segmentation in Surgical Robotics
Recognizing surgical phases and steps from video is a fundamental problem in computer-assisted interventions. Recent approaches increasingly rely on lar...
Multivariate Spatio-Temporal Neural Hawkes Processes
We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed...
MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy
Modern microscopy routinely produces gigapixel images that contain structures across multiple spatial scales, from fine cellular morphology to broader t...
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelera...
Neural Diffusion Intensity Models for Point Process Data
Cox processes model overdispersed point process data via a latent stochastic intensity, but both nonparametric estimation of the intensity model and pos...
Neural Operators Can Discover Functional Clusters
Operator learning is reshaping scientific computing by amortizing inference across infinite families of problems. While neural operators (NOs) are incre...
Neuro-Symbolic ODE Discovery with Latent Grammar Flow
Understanding natural and engineered systems often relies on symbolic formulations, such as differential equations, which provide interpretability and t...
NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transforme...
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint acti...
Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People
Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictatio...
Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint
In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly presum...
Observationally Informed Adaptive Causal Experimental Design
Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data...
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
Modeling neural population dynamics is crucial for foundational neuroscientific research and various clinical applications. Conventional latent variable...
One-Shot Generative Flows: Existence and Obstructions
We study dynamic measure transport for generative modelling in the setting of a stochastic process $X_\bullet$ whose marginals interpolate between a sou...
Online Quantile Regression for Nonparametric Additive Models
This paper introduces a projected functional gradient descent algorithm (P-FGD) for training nonparametric additive quantile regression models in online...
Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction
Online Conformal Prediction (CP) struggles to balance temporal adaptability and structural stability. Feedback-driven methods (e.g., Adaptive Conformal...
Optimized Deferral for Imbalanced Settings
Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational c...
OT on the Map: Quantifying Domain Shifts in Geographic Space
In computer vision and machine learning for geographic data, out-of-domain generalization is a pervasive challenge, arising from uneven global data cove...
Out-of-distribution transfer of PDE foundation models to material dynamics under extreme loading
Most PDE foundation models are pretrained and fine-tuned on fluid-centric benchmarks. Their utility under extreme-loading material dynamics remains uncl...
ParamMem: Augmenting Language Agents with Parametric Reflective Memory
Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent...
Partition Function Estimation under Bounded f-Divergence
We study the statistical complexity of estimating partition functions given sample access to a proposal distribution and an unnormalized density ratio f...
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a 'Visual Signal Dilution' p...
PhyCo: Learning Controllable Physical Priors for Generative Motion
Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebou...
Physics Informed Viscous Value Representations
Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value esti...
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Buildi...
Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction
Plug-and-Play diffusion prior (PnPDP) frameworks have emerged as a powerful paradigm for solving imaging inverse problems by treating pretrained generat...
Plugin Systems - Building Extensible Applications
Build extensible Python applications with entry_points, importlib.metadata, stevedore, __init_subclass__, and plugin lifecycle management.
Policy-Aware Design of Large-Scale Factorial Experiments
Digital firms routinely run many online experiments on shared user populations. When product decisions are compositional, such as combinations of interf...
PONTE: Personalized Orchestration for Natural Language Trustworthy Explanations
Explainable Artificial Intelligence (XAI) seeks to enhance the transparency and accountability of machine learning systems, yet most methods follow a on...
Position: agentic AI orchestration should be Bayes-consistent
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool...
PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction
Three-dimensional medical image data and computer-aided decision making, particularly using deep learning, are becoming increasingly important in the me...
Prediction-powered Inference by Mixture of Experts
The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture,...
Predictive Coding Graphs are a Superset of Feedforward Neural Networks
Predictive coding graphs (PCGs) are a recently introduced generalization to predictive coding networks, a neuroscience-inspired probabilistic latent var...
Preference Packing: Efficient Preference Optimization for Large Language Models
Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. I...
PRIM-cipal components analysis
Supervised No Free Lunch Theorems (NFLTs) are well studied, yet unsupervised NFLTs remain underexplored. For elliptical distributions, we prove that the...
PRISM: LLM-Guided Semantic Clustering for High-Precision Topics
In this paper, we propose Precision-Informed Semantic Modeling (PRISM), a structured topic modeling framework combining the benefits of rich representat...
PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforc...
Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration
Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroim...
Probing the Geometry of Diffusion Models with the String Method
Understanding the geometry of learned distributions is fundamental to improving and interpreting diffusion models, yet systematic tools for exploring th...
Process Reward Agents for Steering Knowledge-Intensive Reasoning
Reasoning in knowledge-intensive domains remains challenging as intermediate steps are often not locally verifiable: unlike math or code, evaluating ste...
Production FastAPI Application
Build a FastAPI app with clean architecture, dependency injection, and proper configuration management.
Prosodic Boundary-Aware Streaming Generation for LLM-Based TTS with Streaming Text Input
Streaming TTS that receives streaming text is essential for interactive systems, yet this scheme faces two major challenges: unnatural prosody due to mi...
PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology
Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade...
Python Clean Architecture Practice Problems & Exercises
Solve 11 Python clean architecture problems (3 Easy, 4 Medium, 4 Hard). Practice dependency rule, domain model with hints, runnable code, and solutions.
Python Configuration Management Practice Problems & Exercises
Solve 11 Python configuration management problems. Covers pydantic settings, environment variables. Hints and solutions.
Python Dependency Injection Practice Problems & Exercises
Solve 11 Python dependency injection problems. Covers DI container, constructor injection, FastAPI dependency. Hints and solutions.
Python Hexagonal Architecture (Ports and Adapters): Practice Problems & Exercises
Solve 11 Python hexagonal architecture (ports and adapters) problems. Covers hexagonal architecture, ports and, protocol port. Hints and solutions.
Python Microservices vs Monolith Practice Problems & Exercises
Solve 11 Python microservices vs monolith problems. Covers microservices vs, bounded context, modular monolith. Hints and solutions.
Python Plugin Systems Practice Problems & Exercises
Solve 11 Python plugin systems problems (3 Easy, 4 Medium, 4 Hard). Practice plugin system, entry_points exercises with hints, runnable code, and solutions.
Python The 12-Factor App Practice Problems & Exercises
Solve 11 Python the 12-factor app problems (3 Easy, 4 Medium, 4 Hard). Practice 12-factor app, 12-factor methodology with hints, runnable code, and solutions.
Quantity Convergence, Quality Divergence: Disentangling Fluency and Accuracy in L2 Mandarin Prosody
While second language (L2) learners may acquire target syntactic word order, mapping this syntax onto appropriate prosodic structures remains a persiste...
Quantum Diffusion Models: Score Reversal Is Not Free in Gaussian Dynamics
Diffusion-based generative modeling suggests reversing a noising semigroup by adding a score drift. For continuous-variable Gaussian Markov dynamics, co...
Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks
Quantum machine learning is a promising field for efficiently learning features of a dataset to perform a specified task, such as classification. Interv...
RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering
Conversational generative AI is rapidly entering healthcare, where general-purpose models must integrate heterogeneous patient signals and support diver...
Randomized Subspace Nesterov Accelerated Gradient
Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is...
RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation
Pathology report generation remains a relatively under-explored downstream task, primarily due to the gigapixel scale and complex morphological heteroge...
RAViT: Resolution-Adaptive Vision Transformer
Vision transformers have recently made a breakthrough in computer vision showing excellent performance in terms of precision for numerous applications....
Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
Recovering camera parameters from images and rendering scenes from novel viewpoints have long been treated as separate tasks in computer vision and grap...
ReAct: Synergizing Reasoning and Acting in Language Models
Engineering breakdown of the ReAct paper (Yao et al., 2022) - the foundation of every AI agent built today. Plain English, production viability rating, implementation notes.
Real-Time Surrogate Modeling for Personalized Blood Flow Prediction and Hemodynamic Analysis
Cardiovascular modeling has rapidly advanced over the past few decades due to the rising needs for health tracking and early detection of cardiovascular...
RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval
We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information. In-context retrieval, which ident...
Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance
Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the complex reasoning capabilities of Large Reaso...
Reflective Context Learning: Studying the Optimization Primitives of Context Space
Generally capable agents must learn from experience in ways that generalize across tasks and environments. The fundamental problems of learning, includi...
Regular Fourier Features for Nonstationary Gaussian Processes
Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample location...
Regularized Online RLHF with Generalized Bilinear Preferences
We consider the problem of contextual online RLHF with general preferences, where the goal is to identify the Nash Equilibrium. We adopt the Generalized...
Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation
For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We...
Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization
We study multiteacher knowledge distillation for low resource abstractive summarization from a reliability aware perspective. We introduce EWAD (Entropy...
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding
Large language models (LLMs) have revolutionized Text-to-SQL generation, allowing users to query structured data using natural language with growing eas...
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for...
Representation Learning for Spatiotemporal Physical Systems
Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate em...
Research Roadmap: The Evolution of AI Agents
From Chain-of-Thought to production agent architectures. Read the 9 most important agent papers in order — with full engineering context between each one.
Research Roadmap: The Evolution of Multimodal AI
From CLIP to GPT-4V to Gemini. Read the 9 most important multimodal AI papers in order — understanding how vision and language were unified.
Research Roadmap: The Evolution of RAG
Read the 8 most important RAG papers in the right order. From the original Lewis et al. through GraphRAG. Full engineering context between each paper.
Resilient Strategies for Stochastic Systems: How Much Does It Take to Break a Winning Strategy?
We study the problem of resilient strategies in the presence of uncertainty. Resilient strategies enable an agent to make decisions that are robust agai...
Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment
Many readers today struggle to assess the trustworthiness of online news because reliable reporting coexists with misinformation. The TREC 2025 DRAGUN (...
Rethinking Forward Processes for Score-Based Data Assimilation in High Dimensions
Data assimilation is the process of estimating the time-evolving state of a dynamical system by integrating model predictions and noisy observations. It...
Revisiting Gene Ontology Knowledge Discovery with Hierarchical Feature Selection and Virtual Study Group of AI Agents
Large language models have achieved great success in multiple challenging tasks, and their capacity can be further boosted by the emerging agentic AI te...
RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models
Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that o...
Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving
With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Curr...
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remai...
Robust support vector model based on bounded asymmetric elastic net loss for binary classification
In this paper, we propose a novel bounded asymmetric elastic net ($L_{baen}$) loss function and combine it with the support vector machine (SVM), result...
Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights
The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterizatio...
Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instab...
RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunA...
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments ex...
SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems
Safety-critical task planning in robotic systems remains challenging: classical planners suffer from poor scalability, Reinforcement Learning (RL)-based...
SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion
Learning-based quadruped controllers achieve impressive agility but typically lack formal safety guarantees under model uncertainty, perception noise, a...
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
Recursive self-improvement is moving from theory to practice: modern systems can critique, revise, and evaluate their own outputs, yet iterative self-mo...
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
We study the sample complexity of learning an $ε$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds...
SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control
While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy up...
Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images
Evaluation of AI systems often requires synthetic test cases, particularly for rare or safety-critical conditions that are difficult to observe in opera...
Scalable Learning of Multivariate Distributions via Coresets
Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistic...
Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize re...
SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation
Incremental Few-Shot (IFS) segmentation aims to learn new categories over time from only a few annotations. Although widely studied in 2D, it remains un...
Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
Prompt learning is a parameter-efficient approach for vision-language models, yet its robustness under label noise is less investigated. Visual content...
SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
We identify occlusion reasoning as a fundamental yet overlooked aspect for 3D layout-conditioned generation. It is essential for synthesizing partially...
SELDON: Supernova Explosions Learned by Deep ODE Networks
The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory's Legacy Survey of Space...
Self-Distilled RLVR
On-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide...
Semantic Invariance in Agentic AI
Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordina...
Semantic Rate-Distortion for Bounded Multi-Agent Communication: Capacity-Derived Semantic Spaces and the Communication Cost of Alignment
When two agents of different computational capacities interact with the same environment, they need not compress a common semantic alphabet differently;...
Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guarantee...
Semantics-Aware Caching for Concept Learning
Concept learning is a form of supervised machine learning that operates on knowledge bases in description logics. State-of-the-art concept learners ofte...
Semi-Supervised Generative Learning via Latent Space Distribution Matching
We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM ope...
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching
Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoisin...
Sentiment Analysis of German Sign Language Fairy Tales
We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels...
Sequential Inference for Gaussian Processes: A Signal Processing Perspective
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in it...
Sharp Convergence Rates for Masked Diffusion Models
Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, with masked (absorbing-rate) variants emerging...
Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks
We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student s...
Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks
Tendon drives paired with soft muscle actuation enable faster and safer robots while potentially accelerating skill acquisition. Still, these systems ar...
SimpliHuMoN: Simplifying Human Motion Prediction
Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been...
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
Data attribution and valuation are critical for understanding data-model synergy for Large Language Models (LLMs), yet existing gradient-based methods s...
SOLID Principles
Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion - applied to production Python.
SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
The Platonic Representation Hypothesis posits that neural networks trained on different modalities converge toward a shared statistical model of the wor...
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Real-world Table-Text question answering (QA) tasks require models that can reason across long text and source tables, traversing multiple hops and exec...
Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
Pure-vision GUI agents provide universal interaction capabilities but suffer from severe efficiency bottlenecks due to the massive spatiotemporal redund...
Spectral Alignment in Forward-Backward Representations via Temporal Abstraction
Forward-backward (FB) representations provide a powerful framework for learning the successor representation (SR) in continuous spaces by enforcing a lo...
Splitting Argumentation Frameworks with Collective Attacks and Supports
This work proposes novel splitting techniques for argumentation formalisms that incorporate supports between defeasible elements. We base our studies on...
SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints
We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the gro...
SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data
With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimat...
SPRINT: Semi-supervised Prototypical Representation for Few-Shot Class-Incremental Tabular Learning
Real-world systems must continuously adapt to novel concepts from limited data without forgetting previously acquired knowledge. While Few-Shot Class-In...
Stable and Steerable Sparse Autoencoders with Weight Regularization
Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary...
State estimations and noise identifications with intermittent corrupted observations via Bayesian variational inference
This paper focuses on the state estimation problem in distributed sensor networks, where intermittent packet dropouts, corrupted observations, and unkno...
Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation
Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is...
Strait: Perceiving Priority and Interference in ML Inference Serving
Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. How...
Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games
AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish primary algorithmic monoculture -- base...
Structural interpretability in SVMs with truncated orthogonal polynomial kernels
We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated repr...
Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport
Multi-view data analysis seeks to integrate multiple representations of the same samples in order to recover a coherent low-dimensional structure. Class...
Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation
Long conversations with an AI agent create a simple problem for one user: the history is useful, but carrying it verbatim is expensive. We study persona...
Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance
One of the dominant paradigms in self-supervised learning (SSL), illustrated by MoCo or DINO, aims to produce robust representations by capturing featur...
SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning
Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it...
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine a...
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and or...
Synthetic data in cryptocurrencies using generative models
Data plays a fundamental role in consolidating markets, services, and products in the digital financial ecosystem. However, the use of real data, especi...
Synthetic Monitoring Environments for Reinforcement Learning
Reinforcement Learning (RL) lacks benchmarks that enable precise, white-box diagnostics of agent behavior. Current environments often entangle complexit...
Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishin...
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces sig...
Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across langu...
Task-Centric Acceleration of Small-Language Models
Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often emp...
Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language
Modern machine learning models are deployed in diverse, non-stationary environments where they must continually adapt to new tasks and evolving knowledg...
Temporal Data Requirement for Predicting Unplanned Hospital Readmissions
With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical dat...
Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek
This study presents the first systematic, reference-free human evaluation of large language model (LLM) machine translation (MT) for Ancient Greek (AG)...
The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus
LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs)...
The 12-Factor App - Building Deployable Python Apps
Apply the 12-Factor App methodology to Python applications with FastAPI, Docker, and PostgreSQL - covering all 12 factors with production-ready code examples.
The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling
Scaling Vision-Language-Action (VLA) models by upgrading the vision encoder is expected to improve downstream manipulation performance--as it does in vi...
The EpisTwin: A Knowledge Graph-Grounded Neuro-Symbolic Architecture for Personal AI
Personal Artificial Intelligence is currently hindered by the fragmentation of user data across isolated silos. While Retrieval-Augmented Generation off...
The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
We study the problem of learning in zero-sum matrix games with repeated play and bandit feedback. Specifically, we focus on developing uncoupled algorit...
The logic of KM belief update is contained in the logic of AGM belief revision
For each axiom of KM belief update we provide a corresponding axiom in a modal logic containing three modal operators: a unimodal belief operator $B$, a...
The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning
Conventional robot social behavior generation has been limited in flexibility and autonomy, relying on predefined motions or human feedback. This study...
The Stability of Online Algorithms in Performative Prediction
The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we s...
Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scal...
Thermodynamic Response Functions in Singular Bayesian Models
Singular statistical models-including mixtures, matrix factorization, and neural networks-violate regular asymptotics due to parameter non-identifiabili...
Time Series Foundation Models as Strong Baselines in Transportation Forecasting: A Large-Scale Benchmark Analysis
Accurate forecasting of transportation dynamics is essential for urban mobility and infrastructure planning. Although recent work has achieved strong pe...
To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling
Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may be...
TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering
Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation....
Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
The advancement of large language models (LLMs) has accelerated the development of autonomous financial trading systems. While mainstream approaches dep...
Toward Generative Quantum Utility via Correlation-Complexity Map
We propose a Correlation-Complexity Map as a practical diagnostic tool for determining when real-world data distributions are structurally aligned with...
Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification
Vision-language models (VLMs) show promise in drafting radiology reports, yet they frequently suffer from logical inconsistencies, generating diagnostic...
Toward World Models for Epidemiology
World models have emerged as a unifying paradigm for learning latent dynamics, simulating counterfactual futures, and supporting planning under uncertai...
Towards Faithful Multimodal Concept Bottleneck Models
Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied i...
Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation
The Room Acoustics and Speaker Distance Estimation (SDE) Challenge at ICASSP 2025 explores the effectiveness of augmented room impulse response (RIR) da...
Transfer Learning for Meta-analysis Under Covariate Shift
Randomized controlled trials often do not represent the populations where decisions are made, and covariate shift across studies can invalidate standard...
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition
Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting model...
Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy
YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparen...
Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training
Population-based learning paradigms, including evolutionary strategies, Population-Based Training (PBT), and recent model-merging methods, combine fast...
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster
AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and ma...
Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurat...
Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models
Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliar...
Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
The recent success of reinforcement learning (RL) in large reasoning models has inspired the growing adoption of RL for post-training Multimodal Large L...
Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset
AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use the...
Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity
Reinforcement Learning with Verifiable Rewards (RLVR) has achieved substantial gains in single-attempt accuracy (Pass@1) on reasoning tasks, yet often s...
Unsupervised Continual Learning for Amortized Bayesian Inference
Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers...
Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising pr...
Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing
Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XA...
Utilizing LLMs for Industrial Process Automation
A growing number of publications address the best practices to use Large Language Models (LLMs) for software engineering in recent years. However, most...
Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control
We present a method to identify a valence-arousal (VA) subspace within large language model representations. From 211k emotion-labeled texts, we derive...
Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture -- Bridging Predictive and Generative Self-Supervised Learning
The Joint-Embedding Predictive Architecture (JEPA) is often seen as a non-generative alternative to likelihood-based self-supervised learning, emphasizi...
Variational Garrote for Sparse Inverse Problems
Sparse regularization plays a central role in solving inverse problems arising from incomplete or corrupted measurements. Different regularizers corresp...
VaSST: Variational Inference for Symbolic Regression using Soft Symbolic Trees
Symbolic regression has recently gained traction in AI-driven scientific discovery, aiming to recover explicit closed-form expressions from data that re...
VecMol: Vector-Field Representations for 3D Molecule Generation
Generative modeling of three-dimensional (3D) molecules is a fundamental yet challenging problem in drug discovery and materials science. Existing appro...
VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
Video agentic models have advanced challenging video-language tasks. However, most agentic approaches still heavily rely on greedy parsing over densely...
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
Vision-language models (VLMs) still struggle with visual perception tasks such as spatial understanding and viewpoint recognition. One plausible contrib...
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
Visual Retrieval-Augmented Generation (VRAG) empowers Vision-Language Models to retrieve and reason over visually rich documents. To tackle complex quer...
Visual-ERM: Reward Modeling for Visual Equivalence
Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representat...
VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certain...
What Does Flow Matching Bring To TD Learning?
Recent work shows that flow matching can be effective for scalar Q-value function estimation in reinforcement learning (RL), but it remains unclear why...
When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models
While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilit...
When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded heal...
When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO
Group Relative Policy Optimization (GRPO) has emerged as an effective method for training reasoning models. While it computes advantages based on group...
Who Guards the Guardians? The Challenges of Evaluating Identifiability of Learned Representations
Identifiability in representation learning is commonly evaluated using standard metrics (e.g., MCC, DCI, R^2) on synthetic benchmarks with known ground-...
Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
Diffusion Language Models (DLMs) are often advertised as enabling parallel token generation, yet practical fast DLMs frequently converge to left-to-righ...
World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings
Recent work interprets the linear recoverability of geographic and temporal variables from large language model (LLM) hidden states as evidence for worl...
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers
Model poisoning attacks pose a significant security threat to Federated Learning (FL). Most existing model poisoning attacks rely on collusion, requirin...
Zeroth-Order Stackelberg Control in Combinatorial Congestion Games
We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish us...
ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost t...
ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training
Deep learning models, despite their impressive achievements, suffer from high computational costs and memory requirements, limiting their usability in r...