Proximal Policy Optimization (PPO)
Mini-batch Updates
PPO optimization process where collected data is divided into small batches to perform multiple gradient passes, improving computational efficiency and stability.
← Zurück