🏠 Home
Benchmark
📊 Tutti i benchmark 🦖 Dinosauro v1 🦖 Dinosauro v2 ✅ App To-Do List 🎨 Pagine libere creative 🎯 FSACB - Ultimate Showcase 🌍 Benchmark traduzione
Modelli
🏆 Top 10 modelli 🆓 Modelli gratuiti 📋 Tutti i modelli ⚙️ Kilo Code
Risorse
💬 Libreria di prompt 📖 Glossario IA 🔗 Link utili

Glossario IA

Il dizionario completo dell'Intelligenza Artificiale

162
categorie
2.032
sottocategorie
23.060
termini
📖
termini

Implicit Max Operator

Mathematical technique in IQL that avoids direct calculation of the maximum over actions by using conservative upper bounds based on the behavior distribution.

📖
termini

Behavior Distribution

Probability distribution of actions in the offline dataset that represents the policy that generated the training data used by IQL.

📖
termini

Conservative Loss Function

Mathematical objective in IQL that penalizes overestimations of Q-values outside the behavior distribution to ensure learning stability.

📖
termini

Implicit Q-Target Estimation

IQL mechanism that calculates target values without explicit maximization, using conditional expectations based on the behavior distribution.

📖
termini

Value-Policy Decoupling

Fundamental principle of IQL separating value function learning from policy extraction to avoid optimization biases in the offline setting.

📖
termini

Offline Training Period

Learning phase where IQL uses only a fixed dataset without environment interaction, ensuring safety and computational efficiency.

📖
termini

Weighted Importance Sampling

Technique used in IQL to correct the mismatch between behavior distribution and target policy by weighting samples according to their relevance.

📖
termini

Batch-Constrained Optimization

Strategy in IQL that constrains learned actions to remain close to those observed in the dataset to avoid unreliable extrapolations.

📖
termini

Offline Distribution Bias

Major challenge in IQL where limited and biased data can lead to incorrect estimates if not properly managed by conservative mechanisms.

📖
termini

Implicit Advantage Function

Extension of IQL that estimates the relative advantages of actions without explicit maximization, enabling more robust action selection in offline contexts.

📖
termini

Behavior Regularization

Mechanism in IQL that penalizes significant deviations from the behavior distribution to maintain stability and avoid risky actions.

📖
termini

Implicit Termination Criterion

Method in IQL for determining learning convergence based on the stability of Q-estimates rather than explicit performance metrics.

📖
termini

Demonstration Experience

Pre-collected dataset used by IQL as the sole learning source, typically originating from experts or existing policies.

🔍

Nessun risultato trovato