Alignment and Safety
Reward Modeling
Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← KembaliApproach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← Kembali