🏠 Home
Benchmark Hub
📊 All Benchmarks 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List Applications 🎨 Creative Free Pages 🎯 FSACB - Ultimate Showcase 🌍 Translation Benchmark
Models
🏆 Top 10 Models 🆓 Free Models 📋 All Models ⚙️ Kilo Code
Resources
💬 Prompts Library 📖 AI Glossary 🔗 Useful Links

AI Glossary

The complete dictionary of Artificial Intelligence

162
categories
2,032
subcategories
23,060
terms
📖
terms

LinUCB

Contextual bandit algorithm assuming a linear relationship between context and expected reward. Uses an upper confidence bound to optimally balance exploration and exploitation.

📖
terms

Contextual Thompson Sampling

Bayesian approach for contextual bandits that samples parameters from their posterior distribution. Selects the arm maximizing the expected reward according to this sample for natural exploration.

📖
terms

Context Vector

Vector representation of observable environmental characteristics at a given time. Serves as the basis for contextual bandit models to predict conditional rewards.

📖
terms

Contextual Regret Rate

Performance measure quantifying the cumulative difference between the obtained reward and that of the best fixed policy a posteriori. Allows evaluation of the effectiveness of contextual bandit algorithms.

📖
terms

Kernel Bandits

Extension of contextual bandits using kernel methods to capture non-linear relationships between context and reward. Enables flexible modeling without strict linearity assumptions.

📖
terms

Matrix Factorization for Bandits

Technique combining contextual bandits and matrix factorization to handle high-dimensional action or context spaces. Efficiently shares information between different contextual configurations.

📖
terms

Hierarchical Bandits

Structure of contextual bandits organized into multiple levels where high-level decisions influence choices available at lower levels. Enables structured and efficient decision-making.

📖
terms

Contextual Exploration

Adaptive exploration strategy taking into account contextual information to optimize data collection. Reduces regret by focusing on the most promising contextual regions.

📖
terms

Bandits with Delayed Feedback

Variant of contextual bandits where the reward is only observed after a significant delay. Requires adapted algorithms to handle temporal uncertainty and maintain efficient learning.

📖
terms

Non-Stationary Bandits

Contextual bandit problem where the reward distribution evolves over time. Requires algorithms capable of adapting to changes to maintain optimal performance.

📖
terms

Adversarial Bandits

Framework where rewards are generated by an adversary rather than following a fixed stochastic distribution. Requires robust strategies guaranteeing worst-case regret bounds.

📖
terms

Bandits with Constraints

Extension of contextual bandits incorporating constraints on resources or costs. Optimizes rewards while respecting limitations imposed by the environment.

📖
terms

Policy Learning

Approach where the algorithm directly learns a policy function mapping contexts to optimal actions. Avoids explicit value estimation for more direct decision-making.

📖
terms

Combinatorial Bandits

Generalization allowing simultaneous selection of multiple arms with combinatorial constraints. Applied to online advertising, set recommendation, and portfolio optimization.

📖
terms

Meta-Learning for Bandits

Approach transferring knowledge acquired across multiple bandit tasks to accelerate learning on new tasks. Particularly useful in contexts with limited initial data.

🔍

No results found