🏠 Home
Benchmark
📊 Tutti i benchmark 🦖 Dinosauro v1 🦖 Dinosauro v2 ✅ App To-Do List 🎨 Pagine libere creative 🎯 FSACB - Ultimate Showcase 🌍 Benchmark traduzione
Modelli
🏆 Top 10 modelli 🆓 Modelli gratuiti 📋 Tutti i modelli ⚙️ Kilo Code
Risorse
💬 Libreria di prompt 📖 Glossario IA 🔗 Link utili

Glossario IA

Il dizionario completo dell'Intelligenza Artificiale

162
categorie
2.032
sottocategorie
23.060
termini
📖
termini

Multi-Armed Bandit

Fundamental reinforcement learning problem where an agent must sequentially select among multiple options (arms) to maximize the sum of obtained rewards.

📖
termini

Exploration-Exploitation Dilemma

Central conflict between exploring new options to discover their potential rewards and exploiting options known to be the most profitable.

📖
termini

Regret Rate

Performance measure quantifying the cumulative difference between obtained rewards and optimal ones, evaluating the effectiveness of the learning strategy.

📖
termini

UCB Algorithm

Optimistic strategy that selects the arm with the highest upper confidence bound, balancing exploration and exploitation through statistical confidence intervals.

📖
termini

ε-greedy Algorithm

Simple policy choosing the optimal arm with probability (1-ε) and exploring randomly with probability ε, controlling the exploration-exploitation trade-off.

📖
termini

Stochastic Reward

Random return following an unknown probability distribution associated with each arm, modeling the inherent uncertainty in real environments.

📖
termini

Action Policy

Rule or algorithm determining the choice of arm at each step based on accumulated information, defining the agent's behavior.

📖
termini

Bernoulli Distribution

Binary reward model (success/failure) frequently used in bandit problems, characterized by a single success probability parameter.

📖
termini

Bayesian Update

Iterative process of updating beliefs about reward distribution parameters by combining prior information and new observations.

📖
termini

Non-Stationary Bandit

Variant where reward distributions change over time, requiring adaptive strategies capable of tracking these variations.

📖
termini

Optimism in the Face of Uncertainty

Algorithmic principle favoring arms with high uncertainty and high reward potential, ensuring efficient exploration.

📖
termini

Convergence Rate

Speed at which the algorithm approaches the optimal policy, measuring the asymptotic efficiency of the learning strategy.

📖
termini

Adversarial Bandit

Scenario where rewards are chosen by an adversary rather than following stochastic distributions, requiring robust strategies.

📖
termini

Optimistic Initialization

Technique initializing reward estimates to high values to encourage early exploration of all available arms.

📖
termini

Linear Bandit

Generalization where the expected reward is a linear function of contextual features, allowing for more complex structures.

📖
termini

Variance Reduction

Technique aimed at decreasing the uncertainty of reward estimates to accelerate convergence to the optimal policy.

🔍

Nessun risultato trovato