Contextual Bandits - AI Glossary

📖

terms

LinUCB

Contextual bandit algorithm assuming a linear relationship between context and expected reward. Uses an upper confidence bound to optimally balance exploration and exploitation.

📖

terms

Contextual Thompson Sampling

Bayesian approach for contextual bandits that samples parameters from their posterior distribution. Selects the arm maximizing the expected reward according to this sample for natural exploration.

📖

terms

Context Vector

Vector representation of observable environmental characteristics at a given time. Serves as the basis for contextual bandit models to predict conditional rewards.

📖

terms

Performance measure quantifying the cumulative difference between the obtained reward and that of the best fixed policy a posteriori. Allows evaluation of the effectiveness of contextual bandit algorithms.

📖

terms

Kernel Bandits

Extension of contextual bandits using kernel methods to capture non-linear relationships between context and reward. Enables flexible modeling without strict linearity assumptions.

📖

terms

Matrix Factorization for Bandits

Technique combining contextual bandits and matrix factorization to handle high-dimensional action or context spaces. Efficiently shares information between different contextual configurations.

📖

terms

Hierarchical Bandits

Structure of contextual bandits organized into multiple levels where high-level decisions influence choices available at lower levels. Enables structured and efficient decision-making.

📖

terms

Contextual Exploration

Adaptive exploration strategy taking into account contextual information to optimize data collection. Reduces regret by focusing on the most promising contextual regions.

📖

terms

Bandits with Delayed Feedback

Variant of contextual bandits where the reward is only observed after a significant delay. Requires adapted algorithms to handle temporal uncertainty and maintain efficient learning.

📖

terms

Non-Stationary Bandits

Contextual bandit problem where the reward distribution evolves over time. Requires algorithms capable of adapting to changes to maintain optimal performance.

📖

terms

Adversarial Bandits

Framework where rewards are generated by an adversary rather than following a fixed stochastic distribution. Requires robust strategies guaranteeing worst-case regret bounds.

📖

terms

Bandits with Constraints

Extension of contextual bandits incorporating constraints on resources or costs. Optimizes rewards while respecting limitations imposed by the environment.

📖

terms

Policy Learning

Approach where the algorithm directly learns a policy function mapping contexts to optimal actions. Avoids explicit value estimation for more direct decision-making.

📖

terms

Combinatorial Bandits

Generalization allowing simultaneous selection of multiple arms with combinatorial constraints. Applied to online advertising, set recommendation, and portfolio optimization.

📖

terms

Meta-Learning for Bandits

Approach transferring knowledge acquired across multiple bandit tasks to accelerate learning on new tasks. Particularly useful in contexts with limited initial data.

AI Glossary

LinUCB

Contextual Thompson Sampling

Context Vector

Contextual Regret Rate

Kernel Bandits

Matrix Factorization for Bandits

Hierarchical Bandits

Contextual Exploration

Bandits with Delayed Feedback

Non-Stationary Bandits

Adversarial Bandits

Bandits with Constraints

Policy Learning

Combinatorial Bandits

Meta-Learning for Bandits

No results found