How does scaling work in practice?

ResearchMath-14K: Scaling Research-Level Mathematics via Agents covers researchmath14k, scaling, researchlevel from first principles with code examples. Free lesson at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-27-researchmath14k-scaling-researchlevel-mathematics-via-agents

What is the difference between researchmath14k and researchlevel?

See the full breakdown at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-27-researchmath14k-scaling-researchlevel-mathematics-via-agents

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

:::info Stub — Full Engineering Breakdown Coming This paper was featured on Hugging Face Daily Papers on 2026-05-27 with 47 upvotes. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::


Authors	Guijin Son et al.
Year	2026
HF Upvotes	47
arXiv	2605.28003
PDF	Download
HF Page	View on Hugging Face

Abstract

The frontier of mathematics is defined by problems whose solutions are not yet known, yet it remains unclear whether language models can meaningfully engage with such problems without human intervention. A major obstacle is the lack of large-scale research-level math datasets. To this end, we introduce ResearchMath-14k, a set of 14{,}056 problems curated from academic sources via a multi-agent pipeline, making it the largest collection of research-level mathematical problems to date. We further generate ResearchMath-Reasoning, 220K teacher trajectories from two open models, where we observe recurring avoidance behaviors such as non-attempts and fabricated references. Interestingly, across eight open-weight models, newer generations produce 5.6times more references and 5.0times more fake references per trace. After agentic filtering of ResearchMath-Reasoning, fine-tuning Qwen3 models from 4B to 30B parameters improves over base models by 9.2 points on average. This shows that filtered open-problem attempts can provide useful supervision even without fully correct reasoning traces. We make ResearchMath-14k publicly available for future works on research-level mathematical reasoning.

Engineering Breakdown

The Problem

A major obstacle is the lack of large-scale research-level math datasets. This shows that filtered open-problem attempts can provide useful supervision even without fully correct reasoning traces.

The Approach

To this end, we introduce ResearchMath-14k, a set of 14{,}056 problems curated from academic sources via a multi-agent pipeline, making it the largest collection of research-level mathematical problems to date.

Key Results

We make ResearchMath-14k publicly available for future works on research-level mathematical reasoning.

Research Areas

This paper contributes to the following areas of AI/ML engineering:

Machine learning
Deep learning
Neural networks
Model optimization
AI systems
Researchmath14k

:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::

Back to Research Lab → · Subscribe to AI Letters →

Abstract​

Engineering Breakdown​

The Problem​

The Approach​

Key Results​

Research Areas​