Proximal Policy Optimization (PPO)
Experience Collection
PPO phase where the agent interacts with the environment following the current policy to collect transitions (state, action, reward) used for optimization.
← Terug