Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Behavioral Cloning
Imitation learning technique where an agent learns to directly reproduce an expert's actions by minimizing the error between its predictions and the provided demonstrations. This approach transforms the learning problem into a standard supervision problem.
Imitation Learning
Machine learning paradigm where an agent acquires skills by observing and reproducing expert behavior, without requiring explicit rewards. This method accelerates learning by capitalizing on pre-existing knowledge.
Action Policy
Mathematical function that maps each state to a probability distribution over possible actions, determining the agent's behavior. In behavioral cloning, this policy is learned directly from expert demonstrations.
Expert Demonstrations
Set of trajectories or state-action examples provided by a human expert or optimal system, serving as training data for imitation learning. These demonstrations encapsulate the optimal strategy to be reproduced.
Prediction Error
Measure quantifying the difference between actions predicted by the agent and the expert's actions in the same states, often calculated via mean squared error or KL divergence. Minimizing this error is the primary objective of behavioral cloning.
Supervised Learning
Learning framework where the model is trained on labeled input-output pairs, used in behavioral cloning to learn the expert policy. This approach allows transforming the imitation problem into a classification or regression task.
Action Distribution
Probabilistic representation of possible actions in a given state, capturing the expert's preferences and uncertainty. Behavioral cloning aims to reproduce this distribution rather than a single deterministic action.
Generalization
Ability of the cloned model to perform correctly on unseen states during training, crucial for robust application of behavioral cloning. Good generalization avoids overfitting to specific demonstrations.
Overfitting
Phenomenon where the model perfectly learns the training demonstrations but fails to generalize to new situations, limiting the effectiveness of behavioral cloning. This problem is exacerbated by data correlation in trajectories.
Offline Learning
Paradigm where the agent learns exclusively from a fixed dataset without interacting with the environment, a key characteristic of behavioral cloning. This approach eliminates the costs and risks associated with active exploration.
Error Correction
Ability of a behavioral cloning system to recover after making an error, often limited by the lack of experience on incorrect states. This limitation motivates the use of hybrid techniques with reinforcement learning.
Reinforcement Learning
Learning paradigm where an agent maximizes cumulative reward through trial and error, often combined with behavioral cloning to improve robustness. This approach allows correcting errors not present in demonstrations.
Inverse Imitation
Process of inferring the reward function or underlying intentions from expert demonstrations, an alternative to direct behavioral cloning. This approach allows better generalization but is more complex to implement.
Imitative Reinforcement Learning
Family of algorithms combining imitation learning and reinforcement learning to benefit from the advantages of both approaches, using demonstrations as an exploration guide. These methods improve robustness and error correction.
Policy Divergence
Phenomenon where the learned policy gradually drifts from the expert policy during interaction with the environment, compromising performance. This divergence is a major limitation of pure behavioral cloning.
Learning Stability
Property of a learning algorithm to converge predictably towards a satisfactory solution without oscillations or divergence, critical in behavioral cloning systems. Stability depends on the quality and coverage of demonstrations.
Knowledge Transfer
Ability to apply skills learned through behavioral cloning to similar but different tasks or environments, essential for scalability. Successful transfer requires a robust and invariant state representation.