Słownik AI
Kompletny słownik sztucznej inteligencji
Fisher Information Matrix
Matrix that measures the amount of information a random observable carries about an unknown parameter, used in TRPO to define the geometry of parameter space.
KL Divergence
Dissimilarity measure between two probability distributions, used in TRPO as a constraint to limit the deviation between successive policies.
Conjugate Gradient
Iterative optimization algorithm used in TRPO to efficiently solve linear systems in the natural gradient descent direction.
Line Search
Optimization procedure that adjusts step size to ensure the update respects trust region constraints in TRPO.
Monotonic Improvement Theory
Theory guaranteeing that a policy updated with TRPO always improves or maintains expected performance under certain trust region conditions.
Reward-to-go
Value function estimator that uses only future rewards after a given timestep to reduce variance in gradient estimation.
Sample Efficiency
Measure of how efficiently a reinforcement learning algorithm utilizes collected data, with TRPO being known for its good sample efficiency.
On-policy Learning
Learning paradigm where collected data must come from the current policy, a fundamental characteristic of TRPO unlike off-policy methods.