🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Generative Adversarial Imitation Learning

Method combining generative adversarial networks with imitation learning to distinguish agent behaviors from expert demonstrations without requiring explicit rewards.

📖
terimler

GAIL (Generative Adversarial Imitation Learning)

Pioneering algorithm using an adversarial game between a discriminator and a generator to learn optimal policies from expert demonstrations.

📖
terimler

Discriminator Network

Neural network trained to classify trajectories as coming from either the expert or the agent, thus providing an implicit reward signal.

📖
terimler

Generator Network

Agent's policy that generates actions in the environment, seeking to produce trajectories indistinguishable from expert demonstrations by the discriminator.

📖
terimler

Implicit Reward Function

Reward signal derived from the discriminator's output, replacing traditional explicit reward functions in reinforcement learning.

📖
terimler

Behavior Distribution

Probabilistic distribution of action-state trajectories that the agent seeks to align with the distribution of expert demonstrations.

📖
terimler

Jensen-Shannon Divergence

Symmetric metric measuring the similarity between probability distributions, used to evaluate convergence between the agent and expert policies.

📖
terimler

Min-Max Game

Mathematical formulation where the discriminator maximizes and the generator minimizes a common objective function, leading to an optimal equilibrium.

📖
terimler

State-Action Trajectory

Chronological sequence of observed states and actions executed by the agent or expert in the learning environment.

📖
terimler

Adversarial Optimization

Simultaneous training process where discriminator and generator parameters are optimized antagonistically.

📖
terimler

Observation Space

Set of all possible observations the agent can perceive from the environment, forming the input to neural networks.

📖
terimler

Replay Memory

Buffer storing previous trajectories of the agent and expert to stabilize training and improve sample efficiency.

📖
terimler

Entropy Coefficient

Regularization parameter encouraging exploration by penalizing overly deterministic action distributions in the agent's policy.

📖
terimler

Total Variation Distance

Alternative metric measuring dissimilarity between two probability distributions, sometimes used instead of JS divergence.

📖
terimler

Importance Ratio

Correction factor weighting off-policy samples to adjust for the difference between behavior policy and target policy.

📖
terimler

Training Stabilization

Set of techniques (gradient penalty, spectral normalization) preventing oscillatory instability in adversarial learning.

📖
terimler

Mode Collapse

Phenomenon where the generator only produces a limited subset of possible behaviors, ignoring the diversity of expert demonstrations.

📖
terimler

Alignment Metric

Quantitative indicator evaluating the similarity between the behavior distributions of the agent and the expert during learning.

🔍

Sonuç bulunamadı