GPT (Generative Pre-trained Transformer)
RLHF (Reinforcement Learning from Human Feedback)
Alignment paradigm where a model is fine-tuned through reinforcement learning using rewards derived from human preferences to calibrate its behavior.
← Zurück