🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Conservative Q-Learning (CQL)

Offline reinforcement learning method that actively penalizes overestimated Q-values to keep the policy close to the behavioral data distribution and prevent divergence.

📖
terimler

Offline data distribution

Fixed and predefined dataset collected from a behavioral policy, serving as the sole source of information for offline RL training.

📖
terimler

Conservative penalty

Regularization term added to the loss function to penalize high Q-values for state-action pairs absent from training data, thus preventing overestimation.

📖
terimler

Q-value overestimation

Inherent problem in offline RL where Q-values are artificially inflated for unobserved actions, leading to suboptimal and unstable policies.

📖
terimler

Conservative policy

Action strategy that intentionally stays close to behaviors observed in the dataset, minimizing the risk of divergence due to extrapolation on unseen data.

📖
terimler

Distribution correction

Mechanism in CQL that adjusts Q-estimations to correct the mismatch between the behavioral distribution and the target policy distribution.

📖
terimler

Policy gap

Measure of divergence between the learned policy and the behavioral policy, crucial for ensuring stability in offline reinforcement learning.

📖
terimler

CQL loss function

Objective function combining standard Q-Learning loss with a conservative term that minimizes Q-values for out-of-distribution actions, forming log(Σexp(Q(s,a))) - Q(s,a').

📖
terimler

Importance Sampling Ratio

Coefficient weighting transitions according to their probability of occurrence under the target policy relative to the behavioral policy, essential for correcting bias.

📖
terimler

Distributional Shift

Fundamental difference between the distribution of available data and that required to accurately evaluate the learned policy, main challenge of offline RL.

📖
terimler

Learning Stabilization

Objective of CQL aiming to guarantee algorithm convergence by avoiding oscillations and divergences caused by extrapolation on limited data.

📖
terimler

Conservative Safeguard

Safety mechanism built into CQL limiting Q-value optimization for state-action pairs that are infrequent or absent from the training dataset.

📖
terimler

Conservative Q-update

Iterative process modifying Q-values by penalizing overestimations while preserving reliable estimates based on observed data.

📖
terimler

Extrapolation Error

Inaccuracy introduced when a model makes predictions for states or actions not represented in the training dataset, major problem in offline RL.

📖
terimler

Conservative Critic

CQL component evaluating actions with a conservative bias, assigning lower scores to actions potentially overestimated due to lack of data.

📖
terimler

Constrained Action Space

Subset of possible actions limited to those observed in the dataset, reducing the risk of policies exploiting extrapolation artifacts.

📖
terimler

Behavior Sampling

Process of collecting transitions (state, action, reward, next state) according to a fixed behavioral policy, constituting the offline dataset.

📖
terimler

Policy Divergence

Phenomenon where the learned policy dangerously deviates from the data distribution, leading to degraded performance or total learning collapse.

🔍

Sonuç bulunamadı