Reinforcement Learning for LLMs

In this course, you’ll learn all about how Reinforcement Learning, Deep RL, and how it is used to train and fine-tune Large Language Models, including:

  • Deep RL, the Bellman Equation, and Policy Gradients

    • Reinforcement Learning with Human Feedback (RLHF)

  • Proximal Policy Optimization (PPO)

  • Direct Preference Optimization (DPO)

  • Group Relative Policy Optimization (GRPO)

Retake this course?
Retaking this course from the beginning will reset all of your tracked progress.
Retake