Generative Adversarial Imitation Learning

📖

pojęcia

Generative Adversarial Imitation Learning

Method combining generative adversarial networks with imitation learning to distinguish agent behaviors from expert demonstrations without requiring explicit rewards.

📖

pojęcia

GAIL (Generative Adversarial Imitation Learning)

Pioneering algorithm using an adversarial game between a discriminator and a generator to learn optimal policies from expert demonstrations.

📖

pojęcia

Discriminator Network

Neural network trained to classify trajectories as coming from either the expert or the agent, thus providing an implicit reward signal.

📖

pojęcia

Generator Network

Agent's policy that generates actions in the environment, seeking to produce trajectories indistinguishable from expert demonstrations by the discriminator.

📖

pojęcia

Implicit Reward Function

Reward signal derived from the discriminator's output, replacing traditional explicit reward functions in reinforcement learning.

📖

pojęcia

Behavior Distribution

Probabilistic distribution of action-state trajectories that the agent seeks to align with the distribution of expert demonstrations.

📖

pojęcia

Jensen-Shannon Divergence

Symmetric metric measuring the similarity between probability distributions, used to evaluate convergence between the agent and expert policies.

📖

pojęcia

Min-Max Game

Mathematical formulation where the discriminator maximizes and the generator minimizes a common objective function, leading to an optimal equilibrium.

📖

pojęcia

State-Action Trajectory

Chronological sequence of observed states and actions executed by the agent or expert in the learning environment.

📖

pojęcia

Adversarial Optimization

Simultaneous training process where discriminator and generator parameters are optimized antagonistically.

📖

pojęcia

Observation Space

Set of all possible observations the agent can perceive from the environment, forming the input to neural networks.

📖

pojęcia

Replay Memory

Buffer storing previous trajectories of the agent and expert to stabilize training and improve sample efficiency.

📖

pojęcia

Entropy Coefficient

Regularization parameter encouraging exploration by penalizing overly deterministic action distributions in the agent's policy.

📖

pojęcia

Total Variation Distance

Alternative metric measuring dissimilarity between two probability distributions, sometimes used instead of JS divergence.

📖

pojęcia

Importance Ratio

Correction factor weighting off-policy samples to adjust for the difference between behavior policy and target policy.

📖

pojęcia

Training Stabilization

Set of techniques (gradient penalty, spectral normalization) preventing oscillatory instability in adversarial learning.

📖

pojęcia

Mode Collapse

Phenomenon where the generator only produces a limited subset of possible behaviors, ignoring the diversity of expert demonstrations.

📖

pojęcia

Alignment Metric

Quantitative indicator evaluating the similarity between the behavior distributions of the agent and the expert during learning.

Słownik AI