Module 11 - Reinforcement Learning
MDPs, dynamic programming, Q-learning, deep Q-networks, policy gradient methods, and RLHF for language models.
MDPs, dynamic programming, Q-learning, deep Q-networks, policy gradient methods, and RLHF for language models.