Proximal Policy Optimization (PPO)
Trust Region
Confidence region in policy space where updates are considered safe, defined by a constraint on KL divergence between successive policies.
← Indietro