Skip to main content

11 docs tagged with "reinforcement-learning"

View all tags

Deep Q-Networks (DQN)

Scale Q-learning to high-dimensional inputs with neural networks. Learn the DQN architecture, experience replay, target networks, Double DQN, Dueling DQN, Prioritized Experience Replay, and Rainbow. Full PyTorch implementation included.

Dynamic Programming for RL

Policy evaluation, policy iteration, and value iteration - solving MDPs exactly when you know the environment model. Master the theoretical foundation that all model-free RL approximates.

MDP and the RL Framework

Master Markov Decision Processes - the mathematical foundation of all reinforcement learning. Understand states, actions, rewards, value functions, the Bellman equations, and how real-world systems are modeled as MDPs.

Module 11 - Reinforcement Learning

A comprehensive module covering RL fundamentals through modern alignment techniques including RLHF and DPO, connecting classical theory to LLM training.

Policy Gradient Methods

Directly optimize policies with gradient ascent - REINFORCE derivation, the log-derivative trick, variance reduction with baselines, actor-critic, A2C/A3C, and entropy regularization. The foundation for PPO and RLHF.

Q-Learning and SARSA

Model-free temporal difference learning - Q-learning for off-policy control and SARSA for on-policy control. Understand TD vs MC vs DP, convergence conditions, eligibility traces, Double Q-learning, and implement Q-tables in NumPy.