How does learning work in practice?

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards covers longtracerl, learning, longcontext from first principles with code examples. Free lesson at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-29-longtracerl-learning-longcontext-reasoning-from-search-agent-trajectories-with-r

What is the difference between longtracerl and longcontext?

See the full breakdown at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-29-longtracerl-learning-longcontext-reasoning-from-search-agent-trajectories-with-r

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

:::info Stub — Full Engineering Breakdown Coming This paper was auto-fetched from arXiv on 2026-06-01. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::


Authors	Nianyi Lin et al.
Year	2026
Field	NLP
arXiv	2605.31584
PDF	Download
Categories	cs.CL, cs.AI, cs.LG

Abstract

Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability distractors and sparse, outcome-only reward signals that cannot supervise intermediate reasoning steps. To address these issues, we introduce \textsc{LongTraceRL}. For data construction, we generate multi-hop questions via knowledge graph random walks and leverage search agent trajectories to build \emph{tiered distractors}: documents the agent read but did not cite (high confusability) and documents that appeared in search results but were never opened (low confusability), producing training contexts that are far more challenging than those built by random sampling or one-shot search. For reward design, we propose a \emph{rubric reward} that uses the gold entities along each reasoning chain as fine-grained, entity-level process supervision. This rubric reward is applied only to responses with correct final answers (positive-only strategy), distinguishing the reasoning quality among correct responses and preventing reward hacking. Experiments on three reasoning LLMs (4B--30B) across five long-context benchmarks demonstrate that \textsc{LongTraceRL} consistently outperforms strong baselines and encourages comprehensive, evidence-grounded reasoning. Codes, datasets and models are available at \href{https://github.com/THU-KEG/LongTraceRL}{https://github.com/THU-KEG/LongTraceRL}.

Engineering Breakdown

The Problem

Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. For data construction, we generate multi-hop questions via knowledge graph random walks and leverage search agent trajectories to build \emph{tiered distractors}: documents the agent read but did not cite (high confusability) and documents that appeared in search results but were never opened (low confusability), producing training contexts that are far more challenging than those built by random sampling or one-shot search.

The Approach

To address these issues, we introduce \textsc{LongTraceRL}. For reward design, we propose a \emph{rubric reward} that uses the gold entities along each reasoning chain as fine-grained, entity-level process supervision.

Key Results

Codes, datasets and models are available at \href{https://github.com/THU-KEG/LongTraceRL}{https://github.com/THU-KEG/LongTraceRL}.

Research Areas

This paper contributes to the following areas of AI/ML engineering:

Large language models
Transformers
Text generation
Natural language processing
Language understanding
Longtracerl

:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::

Back to Research Lab → · Subscribe to AI Letters →

Abstract​

Engineering Breakdown​

The Problem​

The Approach​

Key Results​

Research Areas​