Policy Gradient Methods
Proximal Policy Optimization (PPO)
Algorithm optimizing the policy by constraining updates to stay close to the previous policy, using a clipped objective function to ensure learning stability.
← Wstecz