Proximal Policy Optimization (PPO)
Normalized Advantage
Technique for normalizing advantage estimates to stabilize training by maintaining a consistent gradient scale between updates.
← IndietroTechnique for normalizing advantage estimates to stabilize training by maintaining a consistent gradient scale between updates.
← Indietro