RLHF Basics
The goal of Reinforcement Learning from Human Feedback (RLHF) is aligning LLMs with human expectation. This post explains some of its basics.
The goal of Reinforcement Learning from Human Feedback (RLHF) is aligning LLMs with human expectation. This post explains some of its basics.