Contextual Bandits - AI-ordlista

📖

termer

LinUCB

Contextual bandit algorithm assuming a linear relationship between context and expected reward. Uses an upper confidence bound to optimally balance exploration and exploitation.

📖

termer

Contextual Thompson Sampling

Bayesian approach for contextual bandits that samples parameters from their posterior distribution. Selects the arm maximizing the expected reward according to this sample for natural exploration.

📖

termer

Context Vector

Vector representation of observable environmental characteristics at a given time. Serves as the basis for contextual bandit models to predict conditional rewards.

📖

termer

Performance measure quantifying the cumulative difference between the obtained reward and that of the best fixed policy a posteriori. Allows evaluation of the effectiveness of contextual bandit algorithms.

📖

termer

Kernel Bandits

Extension of contextual bandits using kernel methods to capture non-linear relationships between context and reward. Enables flexible modeling without strict linearity assumptions.

📖

termer

Matrix Factorization for Bandits

Technique combining contextual bandits and matrix factorization to handle high-dimensional action or context spaces. Efficiently shares information between different contextual configurations.

📖

termer

Hierarchical Bandits

Structure of contextual bandits organized into multiple levels where high-level decisions influence choices available at lower levels. Enables structured and efficient decision-making.

📖

termer

Contextual Exploration

Adaptive exploration strategy taking into account contextual information to optimize data collection. Reduces regret by focusing on the most promising contextual regions.

📖

termer

Bandits with Delayed Feedback

Variant of contextual bandits where the reward is only observed after a significant delay. Requires adapted algorithms to handle temporal uncertainty and maintain efficient learning.

📖

termer

Non-Stationary Bandits

Contextual bandit problem where the reward distribution evolves over time. Requires algorithms capable of adapting to changes to maintain optimal performance.

📖

termer

Adversarial Bandits

Framework where rewards are generated by an adversary rather than following a fixed stochastic distribution. Requires robust strategies guaranteeing worst-case regret bounds.

📖

termer

Bandits with Constraints

Extension of contextual bandits incorporating constraints on resources or costs. Optimizes rewards while respecting limitations imposed by the environment.

📖

termer

Policy Learning

Approach where the algorithm directly learns a policy function mapping contexts to optimal actions. Avoids explicit value estimation for more direct decision-making.

📖

termer

Combinatorial Bandits

Generalization allowing simultaneous selection of multiple arms with combinatorial constraints. Applied to online advertising, set recommendation, and portfolio optimization.

📖

termer

Meta-Learning for Bandits

Approach transferring knowledge acquired across multiple bandit tasks to accelerate learning on new tasks. Particularly useful in contexts with limited initial data.

AI-ordlista

LinUCB

Contextual Thompson Sampling

Context Vector

Contextual Regret Rate

Kernel Bandits

Matrix Factorization for Bandits

Hierarchical Bandits

Contextual Exploration

Bandits with Delayed Feedback

Non-Stationary Bandits

Adversarial Bandits

Bandits with Constraints

Policy Learning

Combinatorial Bandits

Meta-Learning for Bandits

Inga resultat hittades