Skip to main content

9 docs tagged with "reasoning-models"

View all tags

DeepSeek-R1 - Open Source Reasoning

How DeepSeek built an open-weights reasoning model using pure RL with GRPO, the R1-Zero experiment, distillation into smaller models, and what open-source reasoning means for the research community.

Evaluating Reasoning Models

The benchmark landscape for reasoning models - AIME, MATH-500, Codeforces, ARC-AGI, GPQA Diamond, process vs. outcome evaluation, and contamination concerns.

Module 10: Reasoning Models

How modern LLMs learn to think - test-time compute, chain-of-thought, process reward models, and the architectures behind o1, o3, and DeepSeek-R1.

Monte Carlo Tree Search for LLM Reasoning

Adapting MCTS to language model reasoning - selection, expansion, simulation, backpropagation over reasoning steps, AlphaCode 2, Tree-of-Thought, and production trade-offs.

Process Reward Models (PRMs)

How process reward models provide step-level supervision for reasoning - the Lightman et al. 2023 paper, Math-Shepherd, using PRMs for search, and their limitations.

Test-Time Compute - Scaling at Inference

The paradigm shift from training-time scaling to inference-time scaling - best-of-N sampling, majority voting, and how spending more compute at inference improves reasoning quality.