Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Offline imitation learning
Learning paradigm where the agent learns to imitate expert behaviors without interacting with the environment, using only a fixed set of pre-recorded demonstrations.
Demonstration set
Static collection of trajectories or expert action examples used as the sole source of information for offline imitation learning.
Offline reinforcement learning
Reinforcement learning approach that uses only a pre-existing dataset without real-time interaction with the environment.
Importance sampling
Statistical technique used to correct the discrepancy between the data distribution and target policy by weighting samples according to their relative probability.
Distribution preservation
Constraint imposed on the learned policy to remain close to the demonstration distribution, thus avoiding risky extrapolations in unknown regions.
Offline trajectory
Complete sequence of states, actions, and rewards recorded from an expert policy, constituting the basic unit of learning data.
Expert policy
Reference strategy that generated the demonstrations, serving as a model to imitate and defining the desired optimal behavior.
Offline estimator
Value or policy estimation algorithm specifically designed to work with static data without requiring interaction with the environment.
Conservative bias correction
Bias correction approach that prioritizes safety by penalizing under-represented actions in the demonstration data.
Constrained imitation learning
Method incorporating explicit constraints on the divergence between the learned policy and the data distribution to ensure stability.
Transition set
Data structure storing tuples (state, action, next state, reward) extracted from expert trajectories for offline training.
Adaptive importance weighting
Dynamic weighting technology that adjusts importance weights based on confidence in data quality in different regions of the state space.
Coverage error
Measure quantifying the mismatch between the support of the data distribution and that of the optimal policy in offline learning.