Online Optimization - 인공지능 용어집

📖

용어

Bandit Algorithm

Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.

📖

용어

Follow the Leader (FTL)

Online optimization strategy where the algorithm chooses at each step the action that would have been optimal on the observed past data up to that point.

📖

용어

Follow the Regularized Leader (FTRL)

Variant of FTL incorporating regularization to stabilize sequential decisions and guarantee better regret bounds in adversarial environments.

📖

용어

Online Gradient Descent

Optimization algorithm that updates model parameters in the direction opposite to the gradient of the loss function computed on each new observation.

📖

용어

Multiplicative Weights Update

Online optimization method that exponentially adjusts weights assigned to experts based on their past performance to combine their predictions.

📖

용어

Expert Advice

Online learning framework where the algorithm must aggregate recommendations from multiple experts to minimize regret relative to the best expert.

📖

용어

Online Convex Optimization

Mathematical theory studying sequential optimization of convex functions where loss functions are gradually revealed over time.

📖

용어

Adversarial Online Learning

Online learning scenario where data is generated by a potentially malicious adversary seeking to maximize the algorithm's regret.

📖

용어

Exploration-Exploitation Trade-off

Fundamental dilemma in online learning between exploring new actions to discover their rewards and exploiting actions known to be high-performing.

📖

용어

Online Mirror Descent

Generalization of gradient descent using a Bregman function to project updates into a constrained space, offering superior flexibility in optimization.

📖

용어

Learning with Partial Information

Paradigm where the algorithm only receives information about the chosen action (bandit) rather than all possible actions (full information).

📖

용어

Adaptive Learning Rate

Mechanism dynamically adjusting the learning step based on local properties of the loss landscape to optimize convergence in non-stationary environments.

📖

용어

Hedge Algorithm

Expert aggregation algorithm using multiplicative weight updates to guarantee a logarithmic regret bound relative to the best expert.

📖

용어

Regret Bound

Theoretical upper limit on the cumulative regret an algorithm may suffer, allowing comparison and performance guarantees for online optimization methods.

📖

용어

Stochastic Online Learning

Learning framework where data follows a fixed but unknown probability distribution, enabling performance guarantees in expectation rather than worst-case.

AI 용어집