less than 1 minute read

The goal of Reinforcement Learning from Human Feedback (RLHF) is aligning LLMs with human expectation. This post explains some of its basics.