Reinforcement Learning for LLMs
In this course, you’ll learn all about how Reinforcement Learning, Deep RL, and how it is used to train and fine-tune Large Language Models, including:
Deep RL, the Bellman Equation, and Policy Gradients
Reinforcement Learning with Human Feedback (RLHF)
Proximal Policy Optimization (PPO)
Direct Preference Optimization (DPO)
Group Relative Policy Optimization (GRPO)
-
Chapter 1
-
Lesson 1: Deep Reinforcement Learning and Policy Gradients
Deep Reinforcement Learning and Policy Gradientes
-
Lesson 2: Reinforcement Learning with Human Feedback (RLHF)
Reinforcement Learning with Human Feedback (RLHF)
-
Lesson 3: Principal Policy Optimization (PPO)
Principal Policy Optimization (PPO)
-
Lesson 4: Direct Preference Optimization (DPO)
Direct Preference Optimization (DPO)
-
Lesson 5: Group Relative Policy Optimization (GRPO)
Group Relative Policy Optimization (GRPO)
-
Retake this course?
Retaking this course from the beginning will reset all of your tracked progress.