DAgger Data Aggregation - AI Glossarium

📖

termen

DAgger (Dataset Aggregation)

Imitation learning algorithm that iteratively collects data by querying an expert on states visited by the current policy. This approach reduces the gap between the training distribution and the deployment distribution.

📖

termen

Data aggregation

Process of collecting and combining multiple datasets from different sources or learning iterations. In DAgger, this allows for progressively improving the robustness of the learned policy.

📖

termen

Iterative collection

Methodology of gathering data performed in several successive cycles, with each cycle using information from previous cycles. This approach allows for continuously refining the policy and exploring new states.

📖

termen

Behavioral policy

Strategy or probability distribution over actions that the agent follows during data collection in DAgger. It evolves across iterations to approach the optimal policy.

📖

termen

State distribution

Probabilistic set of states that the agent is likely to target during its execution. DAgger seeks to align this distribution with that encountered in real deployment.

📖

termen

Distribution bias

Difference between the training data distribution and that encountered during production deployment. DAgger reduces this bias by collecting data on states actually visited by the current policy.

📖

termen

Error correction

Process by which an expert provides the correct actions when the current agent policy makes mistakes. These corrections serve as new training data to improve the policy.

📖

termen

Expert querying

Mechanism for soliciting optimal actions from a human expert or system for specific states visited by the agent. These queries are essential for generating high-quality training data.

📖

termen

Visited state

Specific configuration or situation of the environment that the agent reaches during the execution of its current policy. These states become query points for the expert in DAgger.

📖

termen

Current policy

Current version of the agent's decision-making strategy that evolves at each iteration of the DAgger algorithm. It is used to explore the environment and identify states requiring expert corrections.

📖

termen

Adaptive aggregation

Variant of DAgger that dynamically adjusts the proportion of expert actions versus current policy actions. This adaptation helps balance exploration and exploitation during learning.

📖

termen

Feedback loop

Continuous cycle where the performance of the current policy generates new states, which in turn require expert corrections. This iterative loop is the fundamental improvement mechanism in DAgger.

📖

termen

Online correction

Expert intervention process that occurs during real-time execution of the agent's policy. These immediate corrections help prevent the propagation of errors in trajectories.

📖

termen

Trajectory distribution

Set of state and action sequences that the agent generates by following its current policy. DAgger aims to align this distribution with that produced by the optimal expert policy.

📖

termen

Target policy

Optimal policy that the agent seeks to imitate, typically represented by expert demonstrations. The goal of DAgger is to make the learned policy converge toward this target policy.

📖

termen

Progressive aggregation

Data accumulation strategy where each new iteration adds complementary information to existing data. This approach ensures growing coverage of the relevant state space.

📖

termen

Compaction error

Performance gap between the learned policy and the expert policy due to representation limitations. DAgger minimizes this error by collecting data on the true state distribution.

AI-woordenlijst

DAgger (Dataset Aggregation)

Data aggregation

Iterative collection

Behavioral policy

State distribution

Distribution bias

Error correction

Expert querying

Visited state

Current policy

Adaptive aggregation

Feedback loop

Online correction

Trajectory distribution

Target policy

Progressive aggregation

Compaction error

Geen resultaten gevonden