Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Implicit Max Operator
Mathematical technique in IQL that avoids direct calculation of the maximum over actions by using conservative upper bounds based on the behavior distribution.
Behavior Distribution
Probability distribution of actions in the offline dataset that represents the policy that generated the training data used by IQL.
Conservative Loss Function
Mathematical objective in IQL that penalizes overestimations of Q-values outside the behavior distribution to ensure learning stability.
Implicit Q-Target Estimation
IQL mechanism that calculates target values without explicit maximization, using conditional expectations based on the behavior distribution.
Value-Policy Decoupling
Fundamental principle of IQL separating value function learning from policy extraction to avoid optimization biases in the offline setting.
Offline Training Period
Learning phase where IQL uses only a fixed dataset without environment interaction, ensuring safety and computational efficiency.
Weighted Importance Sampling
Technique used in IQL to correct the mismatch between behavior distribution and target policy by weighting samples according to their relevance.
Batch-Constrained Optimization
Strategy in IQL that constrains learned actions to remain close to those observed in the dataset to avoid unreliable extrapolations.
Offline Distribution Bias
Major challenge in IQL where limited and biased data can lead to incorrect estimates if not properly managed by conservative mechanisms.
Implicit Advantage Function
Extension of IQL that estimates the relative advantages of actions without explicit maximization, enabling more robust action selection in offline contexts.
Behavior Regularization
Mechanism in IQL that penalizes significant deviations from the behavior distribution to maintain stability and avoid risky actions.
Implicit Termination Criterion
Method in IQL for determining learning convergence based on the stability of Q-estimates rather than explicit performance metrics.
Demonstration Experience
Pre-collected dataset used by IQL as the sole learning source, typically originating from experts or existing policies.