AI 용어집
인공지능 완전 사전
Multi-Armed Bandits
Fundamental problem where an agent chooses among several options with random rewards to maximize cumulative gain.
Contextual Bandits
Extension of bandits where rewards depend on an observable context, enabling personalized adaptive decisions.
Combinatorial Bandits
Variant where the agent must select combinations of actions simultaneously with complex constraints and rewards.
Linear Bandits
Approach where rewards are modeled as linear functions of action features or context.
Non-Stationary Bandits
Scenario where reward distributions change over time, requiring adaptive algorithms.
Bandits with Delay
Problem where rewards are only observed after a delay, complicating the attribution of actions to outcomes.
Adversarial Bandits
Model where rewards are generated by an adversary rather than a stochastic process.
Bayesian Bandits
Approach using Bayesian inference to model uncertainty about reward distributions.
Hierarchical Bandits
Multi-level structure where decisions are organized hierarchically to efficiently explore large action spaces.
Bandits with Constraints
Constrained optimization where the agent must maximize rewards while respecting certain limitations.
Bandits for Recommendation
Specific application to recommendation systems for balancing exploration and exploitation of content.
Online Bandits
Continuous learning where the agent adapts in real-time to new information without a prior training phase.