DAgger Data Aggregation

📖

termer

DAgger (Dataset Aggregation)

Imitation learning algorithm that iteratively collects data by querying an expert on states visited by the current policy. This approach reduces the gap between the training distribution and the deployment distribution.

📖

termer

Data aggregation

Process of collecting and combining multiple datasets from different sources or learning iterations. In DAgger, this allows for progressively improving the robustness of the learned policy.

📖

termer

Iterative collection

Methodology of gathering data performed in several successive cycles, with each cycle using information from previous cycles. This approach allows for continuously refining the policy and exploring new states.

📖

termer

Behavioral policy

Strategy or probability distribution over actions that the agent follows during data collection in DAgger. It evolves across iterations to approach the optimal policy.

📖

termer

State distribution

Probabilistic set of states that the agent is likely to target during its execution. DAgger seeks to align this distribution with that encountered in real deployment.

📖

termer

Distribution bias

Difference between the training data distribution and that encountered during production deployment. DAgger reduces this bias by collecting data on states actually visited by the current policy.

📖

termer

Error correction

Process by which an expert provides the correct actions when the current agent policy makes mistakes. These corrections serve as new training data to improve the policy.

📖

termer

Expert querying

Mechanism for soliciting optimal actions from a human expert or system for specific states visited by the agent. These queries are essential for generating high-quality training data.

📖

termer

Visited state

Specific configuration or situation of the environment that the agent reaches during the execution of its current policy. These states become query points for the expert in DAgger.

📖

termer

Current policy

Current version of the agent's decision-making strategy that evolves at each iteration of the DAgger algorithm. It is used to explore the environment and identify states requiring expert corrections.

📖

termer

Adaptive aggregation

Variant of DAgger that dynamically adjusts the proportion of expert actions versus current policy actions. This adaptation helps balance exploration and exploitation during learning.

📖

termer

Feedback loop

Continuous cycle where the performance of the current policy generates new states, which in turn require expert corrections. This iterative loop is the fundamental improvement mechanism in DAgger.

📖

termer

Online correction

Expert intervention process that occurs during real-time execution of the agent's policy. These immediate corrections help prevent the propagation of errors in trajectories.

📖

termer

Trajectory distribution

Set of state and action sequences that the agent generates by following its current policy. DAgger aims to align this distribution with that produced by the optimal expert policy.

📖

termer

Target policy

Optimal policy that the agent seeks to imitate, typically represented by expert demonstrations. The goal of DAgger is to make the learned policy converge toward this target policy.

📖

termer

Progressive aggregation

Data accumulation strategy where each new iteration adds complementary information to existing data. This approach ensures growing coverage of the relevant state space.

📖

termer

Compaction error

Performance gap between the learned policy and the expert policy due to representation limitations. DAgger minimizes this error by collecting data on the true state distribution.

AI-ordlista

DAgger (Dataset Aggregation)

Data aggregation

Iterative collection

Behavioral policy

State distribution

Distribution bias

Error correction

Expert querying

Visited state

Current policy

Adaptive aggregation

Feedback loop

Online correction

Trajectory distribution

Target policy

Progressive aggregation

Compaction error

Inga resultat hittades