01Module 10: Reasoning ModelsHow modern LLMs learn to think - test-time compute, chain-of-thought, process reward models, and the architectures behind o1, o3, and DeepSeek-R1.02Test-Time Compute - Scaling at InferenceThe paradigm shift from training-time scaling to inference-time scaling - best-of-N sampling, majority voting, and how spending more compute at inference improves reasoning quality.03Chain-of-Thought Reasoning at Inference TimeHow chain-of-thought prompting transforms model reasoning - from the Wei et al. 2022 breakthrough to self-consistency, process supervision, and the faithfulness problem.04OpenAI o1 and o3 - Architecture and TrainingWhat we know about OpenAI's o1 and o3 reasoning models - hidden chain-of-thought, reinforcement learning from process rewards, compute budget tokens, and ARC-AGI results.05DeepSeek-R1 - Open Source ReasoningHow DeepSeek built an open-weights reasoning model using pure RL with GRPO, the R1-Zero experiment, distillation into smaller models, and what open-source reasoning means for the research community.06Process Reward Models (PRMs)How process reward models provide step-level supervision for reasoning - the Lightman et al. 2023 paper, Math-Shepherd, using PRMs for search, and their limitations.07Monte Carlo Tree Search for LLM ReasoningAdapting MCTS to language model reasoning - selection, expansion, simulation, backpropagation over reasoning steps, AlphaCode 2, Tree-of-Thought, and production trade-offs.08When to Use Reasoning Models in ProductionA practical decision framework for routing tasks to reasoning models - task taxonomy, cost-benefit analysis, latency trade-offs, and hybrid routing architectures.09Evaluating Reasoning ModelsThe benchmark landscape for reasoning models - AIME, MATH-500, Codeforces, ARC-AGI, GPQA Diamond, process vs. outcome evaluation, and contamination concerns.