Chain-of-Thought Reasoning at Inference Time
How chain-of-thought prompting transforms model reasoning - from the Wei et al. 2022 breakthrough to self-consistency, process supervision, and the faithfulness problem.
How chain-of-thought prompting transforms model reasoning - from the Wei et al. 2022 breakthrough to self-consistency, process supervision, and the faithfulness problem.
How DeepSeek built an open-weights reasoning model using pure RL with GRPO, the R1-Zero experiment, distillation into smaller models, and what open-source reasoning means for the research community.
The benchmark landscape for reasoning models - AIME, MATH-500, Codeforces, ARC-AGI, GPQA Diamond, process vs. outcome evaluation, and contamination concerns.
How modern LLMs learn to think - test-time compute, chain-of-thought, process reward models, and the architectures behind o1, o3, and DeepSeek-R1.
Adapting MCTS to language model reasoning - selection, expansion, simulation, backpropagation over reasoning steps, AlphaCode 2, Tree-of-Thought, and production trade-offs.
What we know about OpenAI's o1 and o3 reasoning models - hidden chain-of-thought, reinforcement learning from process rewards, compute budget tokens, and ARC-AGI results.
How process reward models provide step-level supervision for reasoning - the Lightman et al. 2023 paper, Math-Shepherd, using PRMs for search, and their limitations.
The paradigm shift from training-time scaling to inference-time scaling - best-of-N sampling, majority voting, and how spending more compute at inference improves reasoning quality.
A practical decision framework for routing tasks to reasoning models - task taxonomy, cost-benefit analysis, latency trade-offs, and hybrid routing architectures.