Proximal Policy Optimization (PPO)
KL Divergence Penalty
Penalty added to PPO's objective function to control divergence between successive policies, adaptively adjusted to maintain updates within an acceptable region.
← Indietro