🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Implicit Max Operator

Mathematical technique in IQL that avoids direct calculation of the maximum over actions by using conservative upper bounds based on the behavior distribution.

📖
thuật ngữ

Behavior Distribution

Probability distribution of actions in the offline dataset that represents the policy that generated the training data used by IQL.

📖
thuật ngữ

Conservative Loss Function

Mathematical objective in IQL that penalizes overestimations of Q-values outside the behavior distribution to ensure learning stability.

📖
thuật ngữ

Implicit Q-Target Estimation

IQL mechanism that calculates target values without explicit maximization, using conditional expectations based on the behavior distribution.

📖
thuật ngữ

Value-Policy Decoupling

Fundamental principle of IQL separating value function learning from policy extraction to avoid optimization biases in the offline setting.

📖
thuật ngữ

Offline Training Period

Learning phase where IQL uses only a fixed dataset without environment interaction, ensuring safety and computational efficiency.

📖
thuật ngữ

Weighted Importance Sampling

Technique used in IQL to correct the mismatch between behavior distribution and target policy by weighting samples according to their relevance.

📖
thuật ngữ

Batch-Constrained Optimization

Strategy in IQL that constrains learned actions to remain close to those observed in the dataset to avoid unreliable extrapolations.

📖
thuật ngữ

Offline Distribution Bias

Major challenge in IQL where limited and biased data can lead to incorrect estimates if not properly managed by conservative mechanisms.

📖
thuật ngữ

Implicit Advantage Function

Extension of IQL that estimates the relative advantages of actions without explicit maximization, enabling more robust action selection in offline contexts.

📖
thuật ngữ

Behavior Regularization

Mechanism in IQL that penalizes significant deviations from the behavior distribution to maintain stability and avoid risky actions.

📖
thuật ngữ

Implicit Termination Criterion

Method in IQL for determining learning convergence based on the stability of Q-estimates rather than explicit performance metrics.

📖
thuật ngữ

Demonstration Experience

Pre-collected dataset used by IQL as the sole learning source, typically originating from experts or existing policies.

🔍

Không tìm thấy kết quả