Multi-Armed Bandits - Glossario IA

📖

termini

Multi-Armed Bandit

Fundamental reinforcement learning problem where an agent must sequentially select among multiple options (arms) to maximize the sum of obtained rewards.

📖

termini

Exploration-Exploitation Dilemma

Central conflict between exploring new options to discover their potential rewards and exploiting options known to be the most profitable.

📖

termini

Regret Rate

Performance measure quantifying the cumulative difference between obtained rewards and optimal ones, evaluating the effectiveness of the learning strategy.

📖

termini

UCB Algorithm

Optimistic strategy that selects the arm with the highest upper confidence bound, balancing exploration and exploitation through statistical confidence intervals.

📖

termini

ε-greedy Algorithm

Simple policy choosing the optimal arm with probability (1-ε) and exploring randomly with probability ε, controlling the exploration-exploitation trade-off.

📖

termini

Stochastic Reward

Random return following an unknown probability distribution associated with each arm, modeling the inherent uncertainty in real environments.

📖

termini

Action Policy

Rule or algorithm determining the choice of arm at each step based on accumulated information, defining the agent's behavior.

📖

termini

Bernoulli Distribution

Binary reward model (success/failure) frequently used in bandit problems, characterized by a single success probability parameter.

📖

termini

Bayesian Update

Iterative process of updating beliefs about reward distribution parameters by combining prior information and new observations.

📖

termini

Non-Stationary Bandit

Variant where reward distributions change over time, requiring adaptive strategies capable of tracking these variations.

📖

termini

Optimism in the Face of Uncertainty

Algorithmic principle favoring arms with high uncertainty and high reward potential, ensuring efficient exploration.

📖

termini

Convergence Rate

Speed at which the algorithm approaches the optimal policy, measuring the asymptotic efficiency of the learning strategy.

📖

termini

Adversarial Bandit

Scenario where rewards are chosen by an adversary rather than following stochastic distributions, requiring robust strategies.

📖

termini

Optimistic Initialization

Technique initializing reward estimates to high values to encourage early exploration of all available arms.

📖

termini

Linear Bandit

Generalization where the expected reward is a linear function of contextual features, allowing for more complex structures.

📖

termini

Variance Reduction

Technique aimed at decreasing the uncertainty of reward estimates to accelerate convergence to the optimal policy.

Glossario IA