🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Conservative Q-Learning (CQL)

Offline reinforcement learning method that actively penalizes overestimated Q-values to keep the policy close to the behavioral data distribution and prevent divergence.

📖
thuật ngữ

Offline data distribution

Fixed and predefined dataset collected from a behavioral policy, serving as the sole source of information for offline RL training.

📖
thuật ngữ

Conservative penalty

Regularization term added to the loss function to penalize high Q-values for state-action pairs absent from training data, thus preventing overestimation.

📖
thuật ngữ

Q-value overestimation

Inherent problem in offline RL where Q-values are artificially inflated for unobserved actions, leading to suboptimal and unstable policies.

📖
thuật ngữ

Conservative policy

Action strategy that intentionally stays close to behaviors observed in the dataset, minimizing the risk of divergence due to extrapolation on unseen data.

📖
thuật ngữ

Distribution correction

Mechanism in CQL that adjusts Q-estimations to correct the mismatch between the behavioral distribution and the target policy distribution.

📖
thuật ngữ

Policy gap

Measure of divergence between the learned policy and the behavioral policy, crucial for ensuring stability in offline reinforcement learning.

📖
thuật ngữ

CQL loss function

Objective function combining standard Q-Learning loss with a conservative term that minimizes Q-values for out-of-distribution actions, forming log(Σexp(Q(s,a))) - Q(s,a').

📖
thuật ngữ

Importance Sampling Ratio

Coefficient weighting transitions according to their probability of occurrence under the target policy relative to the behavioral policy, essential for correcting bias.

📖
thuật ngữ

Distributional Shift

Fundamental difference between the distribution of available data and that required to accurately evaluate the learned policy, main challenge of offline RL.

📖
thuật ngữ

Learning Stabilization

Objective of CQL aiming to guarantee algorithm convergence by avoiding oscillations and divergences caused by extrapolation on limited data.

📖
thuật ngữ

Conservative Safeguard

Safety mechanism built into CQL limiting Q-value optimization for state-action pairs that are infrequent or absent from the training dataset.

📖
thuật ngữ

Conservative Q-update

Iterative process modifying Q-values by penalizing overestimations while preserving reliable estimates based on observed data.

📖
thuật ngữ

Extrapolation Error

Inaccuracy introduced when a model makes predictions for states or actions not represented in the training dataset, major problem in offline RL.

📖
thuật ngữ

Conservative Critic

CQL component evaluating actions with a conservative bias, assigning lower scores to actions potentially overestimated due to lack of data.

📖
thuật ngữ

Constrained Action Space

Subset of possible actions limited to those observed in the dataset, reducing the risk of policies exploiting extrapolation artifacts.

📖
thuật ngữ

Behavior Sampling

Process of collecting transitions (state, action, reward, next state) according to a fixed behavioral policy, constituting the offline dataset.

📖
thuật ngữ

Policy Divergence

Phenomenon where the learned policy dangerously deviates from the data distribution, leading to degraded performance or total learning collapse.

🔍

Không tìm thấy kết quả