Alignment and Safety
Reward Modeling
Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← TillbakaApproach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← Tillbaka