Proximal Policy Optimization (PPO)

📖

istilah

Clipping Function

PPO mechanism that limits the magnitude of policy updates by clipping the probability ratio between the new and old policy to avoid overly drastic changes.

📖

istilah

Trust Region

Confidence region in policy space where updates are considered safe, defined by a constraint on KL divergence between successive policies.

📖

istilah

Surrogate Objective

Modified objective function used in PPO that approximates the original objective while incorporating stability constraints like clipping to prevent performance degradation.

📖

istilah

KL Divergence Penalty

Penalty added to PPO's objective function to control divergence between successive policies, adaptively adjusted to maintain updates within an acceptable region.

📖

istilah

Mini-batch Updates

PPO optimization process where collected data is divided into small batches to perform multiple gradient passes, improving computational efficiency and stability.

📖

istilah

Clip Range Parameter

Epsilon hyperparameter in PPO that defines the width of the clipping zone for the probability ratio, directly controlling the conservatism of policy updates.

📖

istilah

Value Function Clipping

PPO variant that also applies clipping to the value function to stabilize learning and prevent large variations in value estimates.

📖

istilah

Epoch Optimization

PPO process where the same experience data is reused for multiple optimization passes, improving the utilization of collected data.

📖

istilah

Normalized Advantage

Technique for normalizing advantage estimates to stabilize training by maintaining a consistent gradient scale between updates.

📖

istilah

Experience Collection

PPO phase where the agent interacts with the environment following the current policy to collect transitions (state, action, reward) used for optimization.

📖

istilah

Adaptive KL Penalty

PPO variant that dynamically adjusts the KL penalty strength based on the observed divergence between policies, ensuring controlled updates.

Glosarium AI

Clipping Function

Trust Region

Surrogate Objective

KL Divergence Penalty

Mini-batch Updates

Clip Range Parameter

Value Function Clipping

Epoch Optimization

Normalized Advantage

Experience Collection

Adaptive KL Penalty

Tidak ada hasil ditemukan