Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
LinUCB
Contextual bandit algorithm assuming a linear relationship between context and expected reward. Uses an upper confidence bound to optimally balance exploration and exploitation.
Contextual Thompson Sampling
Bayesian approach for contextual bandits that samples parameters from their posterior distribution. Selects the arm maximizing the expected reward according to this sample for natural exploration.
Context Vector
Vector representation of observable environmental characteristics at a given time. Serves as the basis for contextual bandit models to predict conditional rewards.
Contextual Regret Rate
Performance measure quantifying the cumulative difference between the obtained reward and that of the best fixed policy a posteriori. Allows evaluation of the effectiveness of contextual bandit algorithms.
Kernel Bandits
Extension of contextual bandits using kernel methods to capture non-linear relationships between context and reward. Enables flexible modeling without strict linearity assumptions.
Matrix Factorization for Bandits
Technique combining contextual bandits and matrix factorization to handle high-dimensional action or context spaces. Efficiently shares information between different contextual configurations.
Hierarchical Bandits
Structure of contextual bandits organized into multiple levels where high-level decisions influence choices available at lower levels. Enables structured and efficient decision-making.
Contextual Exploration
Adaptive exploration strategy taking into account contextual information to optimize data collection. Reduces regret by focusing on the most promising contextual regions.
Bandits with Delayed Feedback
Variant of contextual bandits where the reward is only observed after a significant delay. Requires adapted algorithms to handle temporal uncertainty and maintain efficient learning.
Non-Stationary Bandits
Contextual bandit problem where the reward distribution evolves over time. Requires algorithms capable of adapting to changes to maintain optimal performance.
Adversarial Bandits
Framework where rewards are generated by an adversary rather than following a fixed stochastic distribution. Requires robust strategies guaranteeing worst-case regret bounds.
Bandits with Constraints
Extension of contextual bandits incorporating constraints on resources or costs. Optimizes rewards while respecting limitations imposed by the environment.
Policy Learning
Approach where the algorithm directly learns a policy function mapping contexts to optimal actions. Avoids explicit value estimation for more direct decision-making.
Combinatorial Bandits
Generalization allowing simultaneous selection of multiple arms with combinatorial constraints. Applied to online advertising, set recommendation, and portfolio optimization.
Meta-Learning for Bandits
Approach transferring knowledge acquired across multiple bandit tasks to accelerate learning on new tasks. Particularly useful in contexts with limited initial data.