Proximal Policy Optimization (PPO) - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Clipping Function

PPO mechanism that limits the magnitude of policy updates by clipping the probability ratio between the new and old policy to avoid overly drastic changes.

📖

thuật ngữ

Trust Region

Confidence region in policy space where updates are considered safe, defined by a constraint on KL divergence between successive policies.

📖

thuật ngữ

Surrogate Objective

Modified objective function used in PPO that approximates the original objective while incorporating stability constraints like clipping to prevent performance degradation.

📖

thuật ngữ

KL Divergence Penalty

Penalty added to PPO's objective function to control divergence between successive policies, adaptively adjusted to maintain updates within an acceptable region.

📖

thuật ngữ

Mini-batch Updates

PPO optimization process where collected data is divided into small batches to perform multiple gradient passes, improving computational efficiency and stability.

📖

thuật ngữ

Clip Range Parameter

Epsilon hyperparameter in PPO that defines the width of the clipping zone for the probability ratio, directly controlling the conservatism of policy updates.

📖

thuật ngữ

Value Function Clipping

PPO variant that also applies clipping to the value function to stabilize learning and prevent large variations in value estimates.

📖

thuật ngữ

Epoch Optimization

PPO process where the same experience data is reused for multiple optimization passes, improving the utilization of collected data.

📖

thuật ngữ

Normalized Advantage

Technique for normalizing advantage estimates to stabilize training by maintaining a consistent gradient scale between updates.

📖

thuật ngữ

Experience Collection

PPO phase where the agent interacts with the environment following the current policy to collect transitions (state, action, reward) used for optimization.

📖

thuật ngữ

Adaptive KL Penalty

PPO variant that dynamically adjusts the KL penalty strength based on the observed divergence between policies, ensuring controlled updates.

Thuật ngữ AI

Clipping Function

Trust Region

Surrogate Objective

KL Divergence Penalty

Mini-batch Updates

Clip Range Parameter

Value Function Clipping

Epoch Optimization

Normalized Advantage

Experience Collection

Adaptive KL Penalty

Không tìm thấy kết quả