$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners
Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating mu...
Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating mu...
Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specif...
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms....
A Learning-based Multi-Frame Visual Feature Framewor... - published at NAACL 2025.
Emotion Recognition in Conversations (ERC) presents unique challenges, requiring models to capture the temporal flow of multi-turn dialogues and to effe...
Large language model (LLM) agents, such as OpenAI's Operator and Claude's Computer Use, can automate workflows but unable to handle payment tasks. Exist...
A Practical Analysis of Human Alignment with *PO. - published at NAACL 2025.
A Semantic-Aware Layer-Freezing Approach to Computat... - published at ACL 2025.
A Training-free LLM-based Approach to General Chines... - published at ACL 2025.
Research in AI using Large-Language Models (LLMs) is rapidly evolving, and the comparison of their performance with human reasoning has become a key con...
AbGen: Evaluating Large Language Models in Ablation... - published at ACL 2025.
Under the lens of Marr's levels of analysis, we critique and extend two claims about language models (LMs) and language processing: first, that predicti...
How to intelligently select which examples to annotate when you only have a handful of labeled samples per class. Combines active learning with few-shot text classification to minimize annotation cost - directly applicable to intent detection, content moderation, and domain-specific NLP tasks.
Large vision--language models (VLMs) are increasingly applied to long-video question answering, yet inference is often bottlenecked by the number of inp...
We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within t...
Advancing Language Models through Instruction Tuning... - published at EMNLP 2025.
AERA Chat: An Interactive Platform for Automated Exp... - published at EMNLP 2025.
Transformer attention is typically implemented using softmax normalization, which enforces attention weights with unit sum normalization. While effectiv...
AgentCPM-GUI: Building Mobile-Use Agents with Reinfo... - published at EMNLP 2025.
While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual parti...
Translating natural language into Jira Query Language (JQL) requires resolving ambiguous field references, instance-specific categorical values, and com...
The expansion of retrieval-augmented generation (RAG) into multimodal domains has intensified the challenge for processing complex visual documents, suc...
Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without docu...
We present a winning three-stage system for SemEval 2026 Task~12: Abductive Event Reasoning that combines graph-based retrieval, LLM-driven abductive re...
AIPOM: Agent-aware Interactive Planning for Multi-Ag... - published at EMNLP 2025.
An Address Intelligence Framework for E-commerce Del... - published at EMNLP 2025.
Explainable AI (XAI) research has experienced substantial growth in recent years. Existing XAI methods, however, have been criticized for being technica...
Direct Preference Optimization (DPO) is widely used after supervised fine-tuning (SFT) to align language models, yet empirical behavior under small back...
Kimi K2.5 is an open-weight LLM that rivals closed models across coding, multimodal, and agentic benchmarks, but was released without an accompanying sa...
Analysing LLM Persona Generation and Fairness Interp... - published at EACL 2026.
Argumentative LLMs (ArgLLMs) are an existing approach leveraging Large Language Models (LLMs) and computational argumentation for decision-making, with...
Argumentation and Judgement Factors: LLM-based Disco... - published at EACL 2026.
Can narratives make arguments more persuasive? And to this end, which narrative features matter most? Although stories are often seen as powerful tools...
ASRank: Zero-Shot Re-Ranking with Answer Scent for D... - published at NAACL 2025.
The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended...
Large language models (LLMs) increasingly rely on chain-of-thought (CoT) reasoning to solve complex tasks. Yet ensuring that the reasoning trace both co...
This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks usi...
Automatically Discovering How Misogyny is Framed on... - published at NAACL 2025.
AUTOSUMM: A Comprehensive Framework for LLM-Based Co... - published at ACL 2025.
Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models...
Large language models (LLMs) often produce confident but incorrect answers in settings where abstention would be safer. Standard evaluation protocols, h...
Benchmarking and Building Zero-Shot Hindi Retrieval... - published at NAACL 2025.
Accurate evaluation is central to the large language model (LLM) ecosystem, guiding model selection and downstream adoption across diverse use cases. In...
Beyond "Not Novel Enough": Enriching Schol... - published at EACL 2026.
Large language models (LLMs) are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient for evaluating...
Beyond Grid Search: Leveraging Bayesian Optimization... - published at EACL 2026.
Evaluating the factuality of long-form output generated by large language models (LLMs) remains challenging, particularly when responses are open-ended...
Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks conta...
Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities,...
Large language models (LLMs) encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context...
Large language models with web search are increasingly used in scientific publishing agents, yet they still produce BibTeX entries with pervasive field-...
BOOKCOREF: Coreference Resolution at Book Scale. - published at ACL 2025.
Bridging Attribution and Open-Set Detection using Gr... - published at EACL 2026.
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benc...
Firearm violence is a pressing public health issue, yet research into survivors' lived experiences remains underfunded and difficult to scale. Qualitati...
Cards Against Contamination: TCG-Bench for Difficult... - published at EACL 2026.
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provid...
Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal re...
CFSP: An Efficient Structured Pruning Framework for... - published at COLING 2025.
The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, whi...
Large language models (LLMs) have created new opportunities to enhance the efficiency of scholarly activities; however, challenges persist in the ethica...
This paper reports on the development of a leaderboard of Open Large Language Models (LLM) for European Portuguese (PT-PT), and on its associated benchm...
Building NLP pipelines on Electronic Health Records - named entity recognition for clinical text, negation detection, de-identification for HIPAA compliance, and fine-tuning BERT variants on medical corpora.
The rapid adoption of Large Language Models (LLMs) has transformed modern software development by enabling automated code generation at scale. While the...
CodeGenWrangler: Data Wrangling task automation usin... - published at NAACL 2025.
CodeTaxo: Enhancing Taxonomy Expansion with Limited... - published at ACL 2025.
Cognitive Kernel: An Open-source Agent System toward... - published at NAACL 2025.
Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a funda...
Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, acti...
Compress to Impress: Unleashing the Potential of Com... - published at COLING 2025.
Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks ca...
Clause extraction, obligation detection, risk identification, and building NLP systems for commercial contract analysis at law firm and enterprise scale.
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result...
Controllable Style Arithmetic with Language Models. - published at ACL 2025.
Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on kno...
Craw4LLM: Efficient Web Crawling for LLM Pretraining. - published at ACL 2025.
CUFE@NLU of Devanagari Script Languages 2025: Langua... - published at COLING 2025.
CUFE@VarDial 2025 NorSID: Multilingual BERT for Norw... - published at COLING 2025.
We aim to examine the extent to which Large Language Models (LLMs) can 'talk much' about grammar modules, providing evidence from syntax core properties...
The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benc...
DASR: Distributed Adaptive Scene Recognition - A Mul... - published at EMNLP 2025.
Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving...
Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for...
DEMO: Reframing Dialogue Interaction with Fine-grain... - published at ACL 2025.
Large language models and deep research agents supply citation URLs to support their claims, yet the reliability of these citations has not been systema...
Reinforcement learning with verifiable rewards (RLVR) typically optimizes for outcome rewards without imposing constraints on intermediate reasoning. Th...
The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource se...
Different Time, Different Language: Revisiting the B... - published at EACL 2026.
The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpful...
Achieving human-like responsiveness is a critical yet challenging goal for cascaded spoken dialogue systems. Conventional ASR-LLM-TTS pipelines follow a...
DIVINE : Coordinating Multimodal Disentangled Repres... - published at EACL 2026.
Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we rev...
Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks....
Does Generative AI speak Nigerian-Pidgin?: Issues ab... - published at NAACL 2025.
Does RAG Introduce Unfairness in LLMs? Evaluating Fa... - published at COLING 2025.
Automated annotation of pedagogical dialogue is a high-stakes task where LLMs often fail without sufficient domain grounding. We present a domain-adapte...
Driving Chinese Spelling Correction from a Fine-Grai... - published at COLING 2025.
Adapting Large Language Models (LLMs) to specialized domains requires high-quality instruction tuning datasets, which are expensive to create through hu...
Dual Debiasing for Noisy In-Context Learning for Tex... - published at ACL 2025.
Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-st...
The syntactic structure of a sentence can be represented as a tree where edges indicate syntactic dependencies between words. When that structure is a s...
EasyDistill: A Comprehensive Toolkit for Effective K... - published at EMNLP 2025.
Efficiency-Effectiveness Reranking FLOPs for LLM-bas... - published at EMNLP 2025.
Text-to-SQL enables non-expert users to query databases in natural language, yet real-world schemas often suffer from ambiguous, abbreviated, or inconsi...
Empathy Prediction from Diverse Perspectives. - published at ACL 2025.
The Hyperspace Analogue to Language (HAL) model relies on global word co-occurrence matrices to construct distributional semantic representations. While...
Enhancing Open-Domain Task-Solving Capability of LLM... - published at ACL 2025.
Enhancing Reliability in Community Question Answerin... - published at EACL 2026.
As corporate responsibility increasingly incorporates environmental, social, and governance (ESG) criteria, ESG reporting is becoming a legal requiremen...
In contested domains, instruction-tuned language models must balance user-alignment pressures against faithfulness to the in-context evidence. To evalua...
As large language models (LLMs) advance in linguistic competence, their reasoning abilities are gaining increasing attention. In humans, reasoning often...
Evaluation of Deontic Conditional Reasoning in Large... - published at EACL 2026.
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment....
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dy... - published at ACL 2025.
Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations....
Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal...
We present frequency-ordered tokenization, a simple preprocessing technique that improves lossless text compression by exploiting the power-law frequenc...
The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising soluti...
From Feedback to Checklists: Grounded Evaluation of... - published at EMNLP 2025.
From Long Videos to Engaging Clips: A Human-Inspired... - published at EMNLP 2025.
From Paper to Structured JSON: An Agentic AI Workflo... - published at EACL 2026.
Large language models (LLMs) have recently reshaped Automated Essay Scoring (AES), yet prior studies typically examine individual techniques in isolatio...
Reinforcement learning (RL) for large language models (LLMs) increasingly relies on sparse, outcome-level rewards -- yet determining which actions withi...
GADFA: Generator-Assisted Decision-Focused Approach... - published at COLING 2025.
Generating Multi-Aspect Queries for Conversational S... - published at EACL 2026.
Goal-Driven Data Story, Narrations and Explanations. - published at NAACL 2025.
GRAM: Generative Recommendation via Semantic-aware M... - published at ACL 2025.
We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages)....
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasonin... - published at NAACL 2025.
Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing m...
How Credible Is an Answer From Retrieval-Augmented L... - published at COLING 2025.
Hybrid Graphs for Table-and-Text based Question Answ... - published at NAACL 2025.
I know you are different! Towards Persona Driven Kno... - published at EACL 2026.
Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason ab...
Reducing the hardware footprint of large language models (LLMs) during decoding is critical for efficient long-sequence generation. A key bottleneck is...
Supervised Semantic Differential (SSD) is a mixed quantitative-interpretive method that models how text meaning varies with continuous individual-differ...
InTriage: Intelligent Telephone Triage in Pre-Hospit... - published at EMNLP 2025.
IrokoBench: A New Benchmark for African Languages in... - published at NAACL 2025.
Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of translation erro...
Training Transformer language models is expensive, as performance typically improves with increasing dataset size and computational budget. Although sca...
Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-...
This paper describes the KCLarity team's participation in CLARITY, a shared task at SemEval 2026 on classifying ambiguity and evasion techniques in poli...
Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely by...
A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-...
Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and sub...
Research on developmentally plausible language models has largely focused on English, leaving open questions about multilingual settings. We present a s...
Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving pre...
All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibrati...
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language...
LEMUR: Robust Fine-Tuning for Multilingual Embedding... - published at EACL 2026.
Leveraging Language-based Representations for Better... - published at COLING 2025.
Leveraging LLM-GNN Integration for Open-World Questi... - published at EACL 2026.
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable hu...
LLM-Coordination: Evaluating and Analyzing Multi-age... - published at NAACL 2025.
LLMInit: A Free Lunch from Large Language Models for... - published at EMNLP 2025.
Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech rec...
Loki: An Open-Source Tool for Fact Verification. - published at COLING 2025.
The widespread adoption of reinforcement learning-based alignment highlights the growing importance of reward models. Various benchmarks have been built...
Although Automatic Speech Recognition (ASR) in Bengali has seen significant progress, processing long-duration audio and performing robust speaker diari...
Recent advances in large language models (LLMs) have enabled the large-scale generation of highly fluent and deceptive news-like content. While prior wo...
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels...
Research on classroom interaction has long been divided between large-scale observation and in-depth ethnographic work. We propose a framework mapping t...
McMining: Automated Discovery of Misconceptions in S... - published at EACL 2026.
MCPEval: Automatic MCP-based Deep Evaluation for AI... - published at EMNLP 2025.
Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek-R1 acknowledges hints 39% of the time), implying tha...
Numerous metascience studies and other initiatives have begun to monitor the prevalence of open science practices when it is more important to understan...
Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool o...
Large Language Models (LLMs) have demonstrated remarkable capability in machine translation on high-resource language pairs, yet their performance on lo...
Meta-Reasoning Improves Tool Use in Large Language M... - published at NAACL 2025.
Large Language Models (LLMs) are increasingly being deployed in multilingual, multicultural settings, yet their reliance on predominantly English-centri...
Mirror in the Model: Ad Banner Image Generation via... - published at EMNLP 2025.
Mitigating Copy Bias in In-Context Learning through... - published at EACL 2026.
As Large Language Models (LLMs) are increasingly deployed in cross-linguistic contexts, ensuring safety in diverse regulatory and cultural environments...
Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting sign...
Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We show this is not a failure of encod...
When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench...
Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irre...
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective...
We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We...
MULSUM: A Multimodal Summarization System with Vis-A... - published at EACL 2026.
Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distr...
Multi-Task Pre-Finetuning of Lightweight Transformer... - published at EMNLP 2025.
Multilingual Self-Taught Faithfulness Evaluators. - published at EACL 2026.
Narrative Media Framing in Political Discourse. - published at ACL 2025.
Nemotron-CrossThink: Scaling Self-Learning beyond Ma... - published at EACL 2026.
Instruction Tuning (IT) has been proven to be an effective approach to unlock the powerful capabilities of large language models (LLMs). Recent studies...
This paper explores the response of Large Language Models (LLMs) to user prompts with different degrees of politeness and impoliteness. The Politeness T...
We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transforme...
NormAL LoRA: What is the perfect size? - published at EMNLP 2025.
Odysseus Navigates the Sirens' Song: Dynamic Fo... - published at ACL 2025.
Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a uni...
Recent works proposed test-time alignment methods that rely on a small aligned model as a proxy that guides the generation of a larger base (unaligned)...
Open Political Corpora: Structuring, Searching, and... - published at EMNLP 2025.
This paper presents a systematic benchmark of state-of-the-art multilingual large language models (LLMs) adapted via token pruning - a compression techn...
pEBR: A Probabilistic Approach to Embedding Based Re... - published at EMNLP 2025.
Persona-SQ: A Personalized Suggested Question Genera... - published at NAACL 2025.
PledgeTracker: A System for Monitoring the Fulfilmen... - published at EMNLP 2025.
Constructing computer-aided design (CAD) models is labor-intensive but essential for engineering and manufacturing. Recent advances in Large Language Mo...
Explainable Artificial Intelligence (XAI) seeks to enhance the transparency and accountability of machine learning systems, yet most methods follow a on...
Position-Aware Depth Decay Decoding (D³): Boosting L... - published at ACL 2025.
As large language models (LLMs) transition from research prototypes to real-world systems, customization has emerged as a central bottleneck. While text...
We investigate how verbal and nonverbal linguistic features, exhibited by speakers and listeners in dialogue, can contribute to predicting the listener'...
Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. I...
In this paper, we propose Precision-Informed Semantic Modeling (PRISM), a structured topic modeling framework combining the benefits of rich representat...
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforc...
Problem-Solving Logic Guided Curriculum In-Context L... - published at ACL 2025.
Proceedings of Bridging Neurons and Symbols for Natu... - published at COLING 2025.
Proceedings of Context and Meaning: Navigating Disag... - published at COLING 2025.
Proceedings of the 5th Celtic Language Technology Wo... - published at COLING 2025.
Proceedings of the Joint Workshop of the 9th Financi... - published at COLING 2025.
PromptLab: A Collaborative Platform for Prompt Engin... - published at EACL 2026.
While second language (L2) learners may acquire target syntactic word order, mapping this syntax onto appropriate prosodic structures remains a persiste...
Rad-Flamingo: A Multimodal Prompt driven Radiology R... - published at EACL 2026.
Engineering breakdown of the ReAct paper (Yao et al., 2022) - the foundation of every AI agent built today. Plain English, production viability rating, implementation notes.
Large Language Models (LLMs) achieve strong performance on many reasoning benchmarks, yet these evaluations typically focus on isolated tasks that diffe...
Reasoning Knowledge Filter for Logical Table-to-Text... - published at COLING 2025.
Reasoning-Enhanced Domain-Adaptive Pretraining of Mu... - published at EMNLP 2025.
We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information. In-context retrieval, which ident...
RECIPE-TKG: From Sparse History to Structured Reason... - published at EACL 2026.
Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the complex reasoning capabilities of Large Reaso...
Red Queen: Exposing Latent Multi-Turn Risks in Large... - published at ACL 2025.
RedOne: Revealing Domain-specific LLM Post-Training... - published at EMNLP 2025.
Registering Source Tokens to Target Language Spaces... - published at ACL 2025.
Reinforcement Learning for Aligning Large Language M... - published at NAACL 2025.
We study multiteacher knowledge distillation for low resource abstractive summarization from a reliability aware perspective. We introduce EWAD (Entropy...
Large language models (LLMs) have revolutionized Text-to-SQL generation, allowing users to query structured data using natural language with growing eas...
Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for...
Representing the Under-Represented: Cultural and Cor... - published at COLING 2025.
Read the 8 most important RAG papers in the right order. From the original Lewis et al. through GraphRAG. Full engineering context between each paper.
Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capabilities of Large Language Models. When applied to RLVR,...
Retrieval-augmented generation (RAG) is a common way to ground language models in external documents and up-to-date information. Classical retrieval sys...
RevieWeaver: Weaving Together Review Insights by Lev... - published at NAACL 2025.
Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have...
Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that o...
RichRAG: Crafting Rich Responses for Multi-faceted Q... - published at COLING 2025.
Knowledge graph question answering (KGQA) is a promising approach for mitigating LLM hallucination by grounding reasoning in structured and verifiable k...
RTSM: Knowledge Distillation with Diverse Signals fo... - published at NAACL 2025.
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunA...
Safe: Enhancing Mathematical Reasoning in Large Lang... - published at ACL 2025.
Recursive self-improvement is moving from theory to practice: modern systems can critique, revise, and evaluate their own outputs, yet iterative self-mo...
Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge. A h...
The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior ste...
SciClaims: An End-to-End Generative System for Biome... - published at EMNLP 2025.
Script-Agnosticism and its Impact on Language Identi... - published at NAACL 2025.
sDPO: Don't Use Your Data All at Once. - published at COLING 2025.
SeaLLMs 3: Open Foundation and Chat Multilingual Lar... - published at NAACL 2025.
On-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide...
Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordina...
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guarantee...
Semi-automatic Sequential Sentence Classification in... - published at NAACL 2025.
Sens-Merging: Sensitivity-Guided Parameter Balancing... - published at ACL 2025.
We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels...
SlackAgents: Scalable Collaboration of AI Agents in... - published at EMNLP 2025.
SoftCoT: Soft Chain-of-Thought for Efficient Reasoni... - published at ACL 2025.
Recently, there have been significant advancements in music generation. However, existing models primarily focus on creating modern pop songs, making it...
Real-world Table-Text question answering (QA) tasks require models that can reason across long text and source tables, traversing multiple hops and exec...
Automatic speech recognition (ASR) has benefited from advances in pretrained speech and language models, yet most systems remain constrained to monoling...
Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting p...
As AI-generated fiction becomes increasingly prevalent, questions of authorship and originality are becoming central to how written work is evaluated. W...
Long conversations with an AI agent create a simple problem for one user: the history is useful, but carrying it verbatim is expensive. We study persona...
Structured Tender Entities Extraction from Complex T... - published at COLING 2025.
Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large...
Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and or...
TableCoder: Table Extraction from Text via Reliable... - published at ACL 2025.
Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces sig...
Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across langu...
Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often emp...
Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar...
TelAgentBench: A Multi-faceted Benchmark for Evaluat... - published at EMNLP 2025.
This study presents the first systematic, reference-free human evaluation of large language model (LLM) machine translation (MT) for Ancient Greek (AG)...
Text-Attributed Graph Learning with Coupled Augmenta... - published at COLING 2025.
This study explores artificial visual creativity, focusing on ChatGPT's ability to generate new images intentionally pastiching original artworks such a...
Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior i...
Personal Artificial Intelligence is currently hindered by the fragmentation of user data across isolated silos. While Retrieval-Augmented Generation off...
The Invalsi Benchmarks: measuring the Linguistic and... - published at COLING 2025.
The LLM Language Network: A Neuroscientific Approach... - published at NAACL 2025.
The Role of Handling Attributive Nouns in Improving... - published at COLING 2025.
Thinking with DistilQwen: A Tale of Four Distilled R... - published at EMNLP 2025.
TIPA: Typologically Informed Parameter Aggregation. - published at EACL 2026.
Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation....
Case Report Forms (CRFs) collect data about patients and are at the core of well-established practices to conduct research in clinical settings. With th...
Vision-language models (VLMs) show promise in drafting radiology reports, yet they frequently suffer from logical inconsistencies, generating diagnostic...
Mathematical text understanding is a challenging task due to the presence of specialized entities and complex relationships between them. This study for...
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustwort... - published at NAACL 2025.
UI-to-Code generation requires vision-language models (VLMs) to produce thousands of tokens of structured HTML/CSS from a single screenshot, making visu...
Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurat...
Reinforcement Learning with Verifiable Rewards (RLVR) has achieved substantial gains in single-attempt accuracy (Pass@1) on reasoning tasks, yet often s...
Cooking is a cultural expression of human creativity that transcends geography and time through the orchestration of ingredients and techniques, much li...
We present a method to identify a valence-arousal (VA) subspace within large language model representations. From 211k emotion-labeled texts, we derive...
VCRMNER: Visual Cue Refinement in Multimodal NER usi... - published at COLING 2025.
Video agentic models have advanced challenging video-language tasks. However, most agentic approaches still heavily rely on greedy parsing over densely...
Vision-Language Models Struggle to Align Entities ac... - published at ACL 2025.
Vision-language models (VLMs) still struggle with visual perception tasks such as spatial understanding and viewpoint recognition. One plausible contrib...
Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certain...
VoxpopuliTTS: a large-scale multilingual TTS corpus... - published at COLING 2025.
Watching the AI Watchdogs: A Fairness and Robustness... - published at NAACL 2025.
What Makes for Good Visual Instructions? Synthesizin... - published at COLING 2025.
We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve un...
As Large Language Models (LLMs) increasingly mediate global information access with the potential to shape public discourse, their alignment with univer...
Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithf...
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded heal...
Where Do LLMs Compose Meaning? A Layerwise Analysis... - published at EACL 2026.
Diffusion Language Models (DLMs) are often advertised as enabling parallel token generation, yet practical fast DLMs frequently converge to left-to-righ...
Recent work interprets the linear recoverability of geographic and temporal variables from large language model (LLM) hidden states as evidence for worl...
Norm, the formal theoretical linguist, and Claudette, the computational language scientist, have a lovely time discussing whether modern language models...