🏠 Strona Główna
Benchmarki
📊 Wszystkie benchmarki 🦖 Dinozaur v1 🦖 Dinozaur v2 ✅ Aplikacje To-Do List 🎨 Kreatywne wolne strony 🎯 FSACB - Ostateczny pokaz 🌍 Benchmark tłumaczeń
Modele
🏆 Top 10 modeli 🆓 Darmowe modele 📋 Wszystkie modele ⚙️ Kilo Code
Zasoby
💬 Biblioteka promptów 📖 Słownik AI 🔗 Przydatne linki

Słownik AI

Kompletny słownik sztucznej inteligencji

162
kategorie
2 032
podkategorie
23 060
pojęcia
📖
pojęcia

Generative Adversarial Imitation Learning

Method combining generative adversarial networks with imitation learning to distinguish agent behaviors from expert demonstrations without requiring explicit rewards.

📖
pojęcia

GAIL (Generative Adversarial Imitation Learning)

Pioneering algorithm using an adversarial game between a discriminator and a generator to learn optimal policies from expert demonstrations.

📖
pojęcia

Discriminator Network

Neural network trained to classify trajectories as coming from either the expert or the agent, thus providing an implicit reward signal.

📖
pojęcia

Generator Network

Agent's policy that generates actions in the environment, seeking to produce trajectories indistinguishable from expert demonstrations by the discriminator.

📖
pojęcia

Implicit Reward Function

Reward signal derived from the discriminator's output, replacing traditional explicit reward functions in reinforcement learning.

📖
pojęcia

Behavior Distribution

Probabilistic distribution of action-state trajectories that the agent seeks to align with the distribution of expert demonstrations.

📖
pojęcia

Jensen-Shannon Divergence

Symmetric metric measuring the similarity between probability distributions, used to evaluate convergence between the agent and expert policies.

📖
pojęcia

Min-Max Game

Mathematical formulation where the discriminator maximizes and the generator minimizes a common objective function, leading to an optimal equilibrium.

📖
pojęcia

State-Action Trajectory

Chronological sequence of observed states and actions executed by the agent or expert in the learning environment.

📖
pojęcia

Adversarial Optimization

Simultaneous training process where discriminator and generator parameters are optimized antagonistically.

📖
pojęcia

Observation Space

Set of all possible observations the agent can perceive from the environment, forming the input to neural networks.

📖
pojęcia

Replay Memory

Buffer storing previous trajectories of the agent and expert to stabilize training and improve sample efficiency.

📖
pojęcia

Entropy Coefficient

Regularization parameter encouraging exploration by penalizing overly deterministic action distributions in the agent's policy.

📖
pojęcia

Total Variation Distance

Alternative metric measuring dissimilarity between two probability distributions, sometimes used instead of JS divergence.

📖
pojęcia

Importance Ratio

Correction factor weighting off-policy samples to adjust for the difference between behavior policy and target policy.

📖
pojęcia

Training Stabilization

Set of techniques (gradient penalty, spectral normalization) preventing oscillatory instability in adversarial learning.

📖
pojęcia

Mode Collapse

Phenomenon where the generator only produces a limited subset of possible behaviors, ignoring the diversity of expert demonstrations.

📖
pojęcia

Alignment Metric

Quantitative indicator evaluating the similarity between the behavior distributions of the agent and the expert during learning.

🔍

Nie znaleziono wyników