Alignment and Safety
Reward Modeling
Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← IndietroApproach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← Indietro