Skip to Content

Progress

Reinforcement Learning for LLMs

Complete & Continue Next Lesson Learn More

Chapter 1

5 Lessons

Lesson 1: Deep Reinforcement Learning and Policy Gradients

Lesson 2: Reinforcement Learning with Human Feedback (RLHF)

Lesson 3: Principal Policy Optimization (PPO)

Lesson 4: Direct Preference Optimization (DPO)

Lesson 5: Group Relative Policy Optimization (GRPO)

Reinforcement Learning for LLMs

Complete & Continue Next Lesson Learn More

Add a video

Chapter 1

Lesson 5: Group Relative Policy Optimization (GRPO)

Complete & Continue Next Lesson Learn More

LEARN

Supervised Learning

Unsupervised Learning

Large Language Models

Reinforcement Learning for LLMs

FOR ORGANIZATIONS

Speaking & Consulting

Let’s Connect

RESOURCES

Grokking Machine Learning Book

YouTube Channel

FOLLOW