Słownik AI
Kompletny słownik sztucznej inteligencji
Generative Adversarial Imitation Learning
Method combining generative adversarial networks with imitation learning to distinguish agent behaviors from expert demonstrations without requiring explicit rewards.
GAIL (Generative Adversarial Imitation Learning)
Pioneering algorithm using an adversarial game between a discriminator and a generator to learn optimal policies from expert demonstrations.
Discriminator Network
Neural network trained to classify trajectories as coming from either the expert or the agent, thus providing an implicit reward signal.
Generator Network
Agent's policy that generates actions in the environment, seeking to produce trajectories indistinguishable from expert demonstrations by the discriminator.
Implicit Reward Function
Reward signal derived from the discriminator's output, replacing traditional explicit reward functions in reinforcement learning.
Behavior Distribution
Probabilistic distribution of action-state trajectories that the agent seeks to align with the distribution of expert demonstrations.
Jensen-Shannon Divergence
Symmetric metric measuring the similarity between probability distributions, used to evaluate convergence between the agent and expert policies.
Min-Max Game
Mathematical formulation where the discriminator maximizes and the generator minimizes a common objective function, leading to an optimal equilibrium.
State-Action Trajectory
Chronological sequence of observed states and actions executed by the agent or expert in the learning environment.
Adversarial Optimization
Simultaneous training process where discriminator and generator parameters are optimized antagonistically.
Observation Space
Set of all possible observations the agent can perceive from the environment, forming the input to neural networks.
Replay Memory
Buffer storing previous trajectories of the agent and expert to stabilize training and improve sample efficiency.
Entropy Coefficient
Regularization parameter encouraging exploration by penalizing overly deterministic action distributions in the agent's policy.
Total Variation Distance
Alternative metric measuring dissimilarity between two probability distributions, sometimes used instead of JS divergence.
Importance Ratio
Correction factor weighting off-policy samples to adjust for the difference between behavior policy and target policy.
Training Stabilization
Set of techniques (gradient penalty, spectral normalization) preventing oscillatory instability in adversarial learning.
Mode Collapse
Phenomenon where the generator only produces a limited subset of possible behaviors, ignoring the diversity of expert demonstrations.
Alignment Metric
Quantitative indicator evaluating the similarity between the behavior distributions of the agent and the expert during learning.