Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Bandit Algorithm
Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
Follow the Leader (FTL)
Online optimization strategy where the algorithm chooses at each step the action that would have been optimal on the observed past data up to that point.
Follow the Regularized Leader (FTRL)
Variant of FTL incorporating regularization to stabilize sequential decisions and guarantee better regret bounds in adversarial environments.
Online Gradient Descent
Optimization algorithm that updates model parameters in the direction opposite to the gradient of the loss function computed on each new observation.
Multiplicative Weights Update
Online optimization method that exponentially adjusts weights assigned to experts based on their past performance to combine their predictions.
Expert Advice
Online learning framework where the algorithm must aggregate recommendations from multiple experts to minimize regret relative to the best expert.
Online Convex Optimization
Mathematical theory studying sequential optimization of convex functions where loss functions are gradually revealed over time.
Adversarial Online Learning
Online learning scenario where data is generated by a potentially malicious adversary seeking to maximize the algorithm's regret.
Exploration-Exploitation Trade-off
Fundamental dilemma in online learning between exploring new actions to discover their rewards and exploiting actions known to be high-performing.
Online Mirror Descent
Generalization of gradient descent using a Bregman function to project updates into a constrained space, offering superior flexibility in optimization.
Learning with Partial Information
Paradigm where the algorithm only receives information about the chosen action (bandit) rather than all possible actions (full information).
Adaptive Learning Rate
Mechanism dynamically adjusting the learning step based on local properties of the loss landscape to optimize convergence in non-stationary environments.
Hedge Algorithm
Expert aggregation algorithm using multiplicative weight updates to guarantee a logarithmic regret bound relative to the best expert.
Regret Bound
Theoretical upper limit on the cumulative regret an algorithm may suffer, allowing comparison and performance guarantees for online optimization methods.
Stochastic Online Learning
Learning framework where data follows a fixed but unknown probability distribution, enabling performance guarantees in expectation rather than worst-case.