Alignment and Safety
Reward Modeling
Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← WsteczApproach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← Wstecz