Progress Reinforcement Learning for LLMs Complete & Continue Next Lesson Learn More Chapter 1 5 Lessons Lesson 1: Deep Reinforcement Learning and Policy Gradients Lesson 2: Reinforcement Learning with Human Feedback (RLHF) Lesson 3: Principal Policy Optimization (PPO) Lesson 4: Direct Preference Optimization (DPO) Lesson 5: Group Relative Policy Optimization (GRPO) Reinforcement Learning for LLMs Complete & Continue Next Lesson Learn More Add a video Chapter 1 Lesson 5: Group Relative Policy Optimization (GRPO)