YZ Sözlüğü
Yapay Zekanın tam sözlüğü
LinUCB
Contextual bandit algorithm using linear regression with an Upper Confidence Bound to balance exploration and exploitation in continuous context spaces.
Regret
Performance measure quantifying the difference between the optimal cumulative reward and that obtained by the algorithm, essential for evaluating the effectiveness of contextual bandit strategies.
Context
Set of observable features that influence the optimal decision at a given time, serving as the basis for personalized action selection in contextual bandits.
Off-policy Evaluation
Evaluation technique that estimates the performance of a new policy using data collected by an existing policy, without requiring direct deployment.
Hyperparameters
Configuration parameters of contextual bandit algorithms (such as the exploration coefficient or minibatch size) that influence convergence and performance.
Binary Reward
Type of feedback in contextual bandits where the outcome is limited to success (1) or failure (0), common in recommendation and advertising applications.
Logistic Bandit
Contextual bandit variant using logistic regression to model the probability of binary reward based on context, particularly suited to classification problems.
Neural Bandit
Contextual bandit approach using neural networks to model the complex relationship between context and reward, capable of capturing nonlinearities in the data.
Policy Gradient
Direct policy optimization method in contextual bandits that adjusts parameters to directly maximize expected reward rather than first estimating values.
Contextual UCB
Family of algorithms combining UCB principles with contextual models to guarantee an upper bound on theoretical regret with performance guarantees.