9 docs tagged with "reasoning-models"

Chain-of-Thought Reasoning at Inference Time

How chain-of-thought prompting transforms model reasoning - from the Wei et al. 2022 breakthrough to self-consistency, process supervision, and the faithfulness problem.

DeepSeek-R1 - Open Source Reasoning

How DeepSeek built an open-weights reasoning model using pure RL with GRPO, the R1-Zero experiment, distillation into smaller models, and what open-source reasoning means for the research community.

Evaluating Reasoning Models

The benchmark landscape for reasoning models - AIME, MATH-500, Codeforces, ARC-AGI, GPQA Diamond, process vs. outcome evaluation, and contamination concerns.

Module 10: Reasoning Models

How modern LLMs learn to think - test-time compute, chain-of-thought, process reward models, and the architectures behind o1, o3, and DeepSeek-R1.

Monte Carlo Tree Search for LLM Reasoning

Adapting MCTS to language model reasoning - selection, expansion, simulation, backpropagation over reasoning steps, AlphaCode 2, Tree-of-Thought, and production trade-offs.

OpenAI o1 and o3 - Architecture and Training

What we know about OpenAI's o1 and o3 reasoning models - hidden chain-of-thought, reinforcement learning from process rewards, compute budget tokens, and ARC-AGI results.

Process Reward Models (PRMs)

How process reward models provide step-level supervision for reasoning - the Lightman et al. 2023 paper, Math-Shepherd, using PRMs for search, and their limitations.

Test-Time Compute - Scaling at Inference

The paradigm shift from training-time scaling to inference-time scaling - best-of-N sampling, majority voting, and how spending more compute at inference improves reasoning quality.

When to Use Reasoning Models in Production

A practical decision framework for routing tasks to reasoning models - task taxonomy, cost-benefit analysis, latency trade-offs, and hybrid routing architectures.