Bandit Reinforcement Learning - 인공지능 용어집

📂

하위 카테고리

Multi-Armed Bandits

Fundamental problem where an agent chooses among several options with random rewards to maximize cumulative gain.

16 용어

📂

하위 카테고리

Contextual Bandits

Extension of bandits where rewards depend on an observable context, enabling personalized adaptive decisions.

15 용어

📂

하위 카테고리

Combinatorial Bandits

Variant where the agent must select combinations of actions simultaneously with complex constraints and rewards.

16 용어

📂

하위 카테고리

Linear Bandits

Approach where rewards are modeled as linear functions of action features or context.

11 용어

📂

하위 카테고리

Non-Stationary Bandits

Scenario where reward distributions change over time, requiring adaptive algorithms.

12 용어

📂

하위 카테고리

Bandits with Delay

Problem where rewards are only observed after a delay, complicating the attribution of actions to outcomes.

17 용어

📂

하위 카테고리

Adversarial Bandits

Model where rewards are generated by an adversary rather than a stochastic process.

16 용어

📂

하위 카테고리

Bayesian Bandits

Approach using Bayesian inference to model uncertainty about reward distributions.

12 용어

📂

하위 카테고리

Hierarchical Bandits

Multi-level structure where decisions are organized hierarchically to efficiently explore large action spaces.

17 용어

📂

하위 카테고리

Bandits with Constraints

Constrained optimization where the agent must maximize rewards while respecting certain limitations.

20 용어

📂

하위 카테고리

Bandits for Recommendation

Specific application to recommendation systems for balancing exploration and exploitation of content.

8 용어

📂

하위 카테고리

Online Bandits

Continuous learning where the agent adapts in real-time to new information without a prior training phase.

9 용어

AI 용어집

Multi-Armed Bandits

Contextual Bandits

Combinatorial Bandits

Linear Bandits

Non-Stationary Bandits

Bandits with Delay

Adversarial Bandits

Bayesian Bandits

Hierarchical Bandits

Bandits with Constraints

Bandits for Recommendation

Online Bandits

결과를 찾을 수 없습니다