Contextual Bandits - Glossario IA

📖

termini

LinUCB

Contextual bandit algorithm assuming a linear relationship between context and expected reward. Uses an upper confidence bound to optimally balance exploration and exploitation.

📖

termini

Contextual Thompson Sampling

Bayesian approach for contextual bandits that samples parameters from their posterior distribution. Selects the arm maximizing the expected reward according to this sample for natural exploration.

📖

termini

Context Vector

Vector representation of observable environmental characteristics at a given time. Serves as the basis for contextual bandit models to predict conditional rewards.

📖

termini

Performance measure quantifying the cumulative difference between the obtained reward and that of the best fixed policy a posteriori. Allows evaluation of the effectiveness of contextual bandit algorithms.

📖

termini

Kernel Bandits

Extension of contextual bandits using kernel methods to capture non-linear relationships between context and reward. Enables flexible modeling without strict linearity assumptions.

📖

termini

Matrix Factorization for Bandits

Technique combining contextual bandits and matrix factorization to handle high-dimensional action or context spaces. Efficiently shares information between different contextual configurations.

📖

termini

Hierarchical Bandits

Structure of contextual bandits organized into multiple levels where high-level decisions influence choices available at lower levels. Enables structured and efficient decision-making.

📖

termini

Contextual Exploration

Adaptive exploration strategy taking into account contextual information to optimize data collection. Reduces regret by focusing on the most promising contextual regions.

📖

termini

Bandits with Delayed Feedback

Variant of contextual bandits where the reward is only observed after a significant delay. Requires adapted algorithms to handle temporal uncertainty and maintain efficient learning.

📖

termini

Non-Stationary Bandits

Contextual bandit problem where the reward distribution evolves over time. Requires algorithms capable of adapting to changes to maintain optimal performance.

📖

termini

Adversarial Bandits

Framework where rewards are generated by an adversary rather than following a fixed stochastic distribution. Requires robust strategies guaranteeing worst-case regret bounds.

📖

termini

Bandits with Constraints

Extension of contextual bandits incorporating constraints on resources or costs. Optimizes rewards while respecting limitations imposed by the environment.

📖

termini

Policy Learning

Approach where the algorithm directly learns a policy function mapping contexts to optimal actions. Avoids explicit value estimation for more direct decision-making.

📖

termini

Combinatorial Bandits

Generalization allowing simultaneous selection of multiple arms with combinatorial constraints. Applied to online advertising, set recommendation, and portfolio optimization.

📖

termini

Meta-Learning for Bandits

Approach transferring knowledge acquired across multiple bandit tasks to accelerate learning on new tasks. Particularly useful in contexts with limited initial data.

Glossario IA

LinUCB

Contextual Thompson Sampling

Context Vector

Contextual Regret Rate

Kernel Bandits

Matrix Factorization for Bandits

Hierarchical Bandits

Contextual Exploration

Bandits with Delayed Feedback

Non-Stationary Bandits

Adversarial Bandits

Bandits with Constraints

Policy Learning

Combinatorial Bandits

Meta-Learning for Bandits

Nessun risultato trovato