Alignment and Safety
Reward Modeling
Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← 뒤로Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
← 뒤로