🏠 Startseite
Vergleiche
📊 Alle Benchmarks 🦖 Dinosaurier v1 🦖 Dinosaurier v2 ✅ To-Do-Listen-Apps 🎨 Kreative freie Seiten 🎯 FSACB - Ultimatives Showcase 🌍 Übersetzungs-Benchmark
Modelle
🏆 Top 10 Modelle 🆓 Kostenlose Modelle 📋 Alle Modelle ⚙️ Kilo Code
Ressourcen
💬 Prompt-Bibliothek 📖 KI-Glossar 🔗 Nützliche Links

KI-Glossar

Das vollständige Wörterbuch der Künstlichen Intelligenz

162
Kategorien
2.032
Unterkategorien
23.060
Begriffe
📖
Begriffe

Conservative Q-Learning (CQL)

Offline reinforcement learning method that actively penalizes overestimated Q-values to keep the policy close to the behavioral data distribution and prevent divergence.

📖
Begriffe

Offline data distribution

Fixed and predefined dataset collected from a behavioral policy, serving as the sole source of information for offline RL training.

📖
Begriffe

Conservative penalty

Regularization term added to the loss function to penalize high Q-values for state-action pairs absent from training data, thus preventing overestimation.

📖
Begriffe

Q-value overestimation

Inherent problem in offline RL where Q-values are artificially inflated for unobserved actions, leading to suboptimal and unstable policies.

📖
Begriffe

Conservative policy

Action strategy that intentionally stays close to behaviors observed in the dataset, minimizing the risk of divergence due to extrapolation on unseen data.

📖
Begriffe

Distribution correction

Mechanism in CQL that adjusts Q-estimations to correct the mismatch between the behavioral distribution and the target policy distribution.

📖
Begriffe

Policy gap

Measure of divergence between the learned policy and the behavioral policy, crucial for ensuring stability in offline reinforcement learning.

📖
Begriffe

CQL loss function

Objective function combining standard Q-Learning loss with a conservative term that minimizes Q-values for out-of-distribution actions, forming log(Σexp(Q(s,a))) - Q(s,a').

📖
Begriffe

Importance Sampling Ratio

Coefficient weighting transitions according to their probability of occurrence under the target policy relative to the behavioral policy, essential for correcting bias.

📖
Begriffe

Distributional Shift

Fundamental difference between the distribution of available data and that required to accurately evaluate the learned policy, main challenge of offline RL.

📖
Begriffe

Learning Stabilization

Objective of CQL aiming to guarantee algorithm convergence by avoiding oscillations and divergences caused by extrapolation on limited data.

📖
Begriffe

Conservative Safeguard

Safety mechanism built into CQL limiting Q-value optimization for state-action pairs that are infrequent or absent from the training dataset.

📖
Begriffe

Conservative Q-update

Iterative process modifying Q-values by penalizing overestimations while preserving reliable estimates based on observed data.

📖
Begriffe

Extrapolation Error

Inaccuracy introduced when a model makes predictions for states or actions not represented in the training dataset, major problem in offline RL.

📖
Begriffe

Conservative Critic

CQL component evaluating actions with a conservative bias, assigning lower scores to actions potentially overestimated due to lack of data.

📖
Begriffe

Constrained Action Space

Subset of possible actions limited to those observed in the dataset, reducing the risk of policies exploiting extrapolation artifacts.

📖
Begriffe

Behavior Sampling

Process of collecting transitions (state, action, reward, next state) according to a fixed behavioral policy, constituting the offline dataset.

📖
Begriffe

Policy Divergence

Phenomenon where the learned policy dangerously deviates from the data distribution, leading to degraded performance or total learning collapse.

🔍

Keine Ergebnisse gefunden