🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

DAgger (Dataset Aggregation)

Imitation learning algorithm that iteratively collects data by querying an expert on states visited by the current policy. This approach reduces the gap between the training distribution and the deployment distribution.

📖
thuật ngữ

Data aggregation

Process of collecting and combining multiple datasets from different sources or learning iterations. In DAgger, this allows for progressively improving the robustness of the learned policy.

📖
thuật ngữ

Iterative collection

Methodology of gathering data performed in several successive cycles, with each cycle using information from previous cycles. This approach allows for continuously refining the policy and exploring new states.

📖
thuật ngữ

Behavioral policy

Strategy or probability distribution over actions that the agent follows during data collection in DAgger. It evolves across iterations to approach the optimal policy.

📖
thuật ngữ

State distribution

Probabilistic set of states that the agent is likely to target during its execution. DAgger seeks to align this distribution with that encountered in real deployment.

📖
thuật ngữ

Distribution bias

Difference between the training data distribution and that encountered during production deployment. DAgger reduces this bias by collecting data on states actually visited by the current policy.

📖
thuật ngữ

Error correction

Process by which an expert provides the correct actions when the current agent policy makes mistakes. These corrections serve as new training data to improve the policy.

📖
thuật ngữ

Expert querying

Mechanism for soliciting optimal actions from a human expert or system for specific states visited by the agent. These queries are essential for generating high-quality training data.

📖
thuật ngữ

Visited state

Specific configuration or situation of the environment that the agent reaches during the execution of its current policy. These states become query points for the expert in DAgger.

📖
thuật ngữ

Current policy

Current version of the agent's decision-making strategy that evolves at each iteration of the DAgger algorithm. It is used to explore the environment and identify states requiring expert corrections.

📖
thuật ngữ

Adaptive aggregation

Variant of DAgger that dynamically adjusts the proportion of expert actions versus current policy actions. This adaptation helps balance exploration and exploitation during learning.

📖
thuật ngữ

Feedback loop

Continuous cycle where the performance of the current policy generates new states, which in turn require expert corrections. This iterative loop is the fundamental improvement mechanism in DAgger.

📖
thuật ngữ

Online correction

Expert intervention process that occurs during real-time execution of the agent's policy. These immediate corrections help prevent the propagation of errors in trajectories.

📖
thuật ngữ

Trajectory distribution

Set of state and action sequences that the agent generates by following its current policy. DAgger aims to align this distribution with that produced by the optimal expert policy.

📖
thuật ngữ

Target policy

Optimal policy that the agent seeks to imitate, typically represented by expert demonstrations. The goal of DAgger is to make the learned policy converge toward this target policy.

📖
thuật ngữ

Progressive aggregation

Data accumulation strategy where each new iteration adds complementary information to existing data. This approach ensures growing coverage of the relevant state space.

📖
thuật ngữ

Compaction error

Performance gap between the learned policy and the expert policy due to representation limitations. DAgger minimizes this error by collecting data on the true state distribution.

🔍

Không tìm thấy kết quả