Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
DAgger (Dataset Aggregation)
Imitation learning algorithm that iteratively collects data by querying an expert on states visited by the current policy. This approach reduces the gap between the training distribution and the deployment distribution.
Data aggregation
Process of collecting and combining multiple datasets from different sources or learning iterations. In DAgger, this allows for progressively improving the robustness of the learned policy.
Iterative collection
Methodology of gathering data performed in several successive cycles, with each cycle using information from previous cycles. This approach allows for continuously refining the policy and exploring new states.
Behavioral policy
Strategy or probability distribution over actions that the agent follows during data collection in DAgger. It evolves across iterations to approach the optimal policy.
State distribution
Probabilistic set of states that the agent is likely to target during its execution. DAgger seeks to align this distribution with that encountered in real deployment.
Distribution bias
Difference between the training data distribution and that encountered during production deployment. DAgger reduces this bias by collecting data on states actually visited by the current policy.
Error correction
Process by which an expert provides the correct actions when the current agent policy makes mistakes. These corrections serve as new training data to improve the policy.
Expert querying
Mechanism for soliciting optimal actions from a human expert or system for specific states visited by the agent. These queries are essential for generating high-quality training data.
Visited state
Specific configuration or situation of the environment that the agent reaches during the execution of its current policy. These states become query points for the expert in DAgger.
Current policy
Current version of the agent's decision-making strategy that evolves at each iteration of the DAgger algorithm. It is used to explore the environment and identify states requiring expert corrections.
Adaptive aggregation
Variant of DAgger that dynamically adjusts the proportion of expert actions versus current policy actions. This adaptation helps balance exploration and exploitation during learning.
Feedback loop
Continuous cycle where the performance of the current policy generates new states, which in turn require expert corrections. This iterative loop is the fundamental improvement mechanism in DAgger.
Online correction
Expert intervention process that occurs during real-time execution of the agent's policy. These immediate corrections help prevent the propagation of errors in trajectories.
Trajectory distribution
Set of state and action sequences that the agent generates by following its current policy. DAgger aims to align this distribution with that produced by the optimal expert policy.
Target policy
Optimal policy that the agent seeks to imitate, typically represented by expert demonstrations. The goal of DAgger is to make the learned policy converge toward this target policy.
Progressive aggregation
Data accumulation strategy where each new iteration adds complementary information to existing data. This approach ensures growing coverage of the relevant state space.
Compaction error
Performance gap between the learned policy and the expert policy due to representation limitations. DAgger minimizes this error by collecting data on the true state distribution.