Słownik AI

Kompletny słownik sztucznej inteligencji

162

kategorie

2 032

podkategorie

23 060

pojęcia

📖

pojęcia

Fisher Information Matrix

Matrix that measures the amount of information a random observable carries about an unknown parameter, used in TRPO to define the geometry of parameter space.

📖

pojęcia

KL Divergence

Dissimilarity measure between two probability distributions, used in TRPO as a constraint to limit the deviation between successive policies.

📖

pojęcia

Conjugate Gradient

Iterative optimization algorithm used in TRPO to efficiently solve linear systems in the natural gradient descent direction.

📖

pojęcia

Line Search

Optimization procedure that adjusts step size to ensure the update respects trust region constraints in TRPO.

📖

pojęcia

Monotonic Improvement Theory

Theory guaranteeing that a policy updated with TRPO always improves or maintains expected performance under certain trust region conditions.

📖

pojęcia

Reward-to-go

Value function estimator that uses only future rewards after a given timestep to reduce variance in gradient estimation.

📖

pojęcia

Sample Efficiency

Measure of how efficiently a reinforcement learning algorithm utilizes collected data, with TRPO being known for its good sample efficiency.

📖

pojęcia

On-policy Learning

Learning paradigm where collected data must come from the current policy, a fundamental characteristic of TRPO unlike off-policy methods.

🔍

Słownik AI

Fisher Information Matrix

KL Divergence

Conjugate Gradient

Line Search

Monotonic Improvement Theory

Reward-to-go

Sample Efficiency

On-policy Learning

Nie znaleziono wyników