🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Actor-Critic

Reinforcement learning architecture combining an actor network that learns a stochastic policy and a critic network that estimates the value function to reduce the policy gradient variance.

📖
terimler

Value Function

Mathematical function estimating the expected cumulative return from a state or state-action pair, serving as the learning signal for the critic in the Actor-Critic architecture.

📖
terimler

Asynchronous Advantage Actor-Critic

Distributed architecture where multiple agents train in parallel on independent environments, periodically sharing their gradients to accelerate learning.

📖
terimler

Deep Deterministic Policy Gradient

Actor-Critic algorithm for continuous action spaces using deep neural networks with deterministic policy and replay buffer for stable off-policy learning.

📖
terimler

Twin Delayed Deep Deterministic Policy Gradient

Improvement over DDPG using twin critics to reduce value overestimation and delayed updates of the actor and targets for better stability.

📖
terimler

Soft Actor-Critic

Actor-Critic algorithm maximizing an entropy-augmented reward combining return and entropy to encourage exploration, using stable and efficient off-policy updates.

📖
terimler

Advantage Actor-Critic

Synchronous variant of A3C using advantage estimation to reduce policy gradient variance, with batch updates for better stability on GPU.

📖
terimler

Critic Network

Neural network estimating the value function V(s) or Q(s,a) to provide the TD learning signal to the actor, using prediction error as optimization gradient.

🔍

Sonuç bulunamadı