Skip to main content

Module 11 - Reinforcement Learning

MDPs, dynamic programming, Q-learning, deep Q-networks, policy gradient methods, and RLHF for language models.