Bandits with Limited Feedback - 인공지능 용어집

📖

용어

Binary Feedback

Type of feedback where only a positive/negative indication is observed after each action, without information on the reward magnitude. This format limits the information available to the learning algorithm.

📖

용어

Pairwise Feedback

Comparative information between two actions where only the winner is revealed, masking the absolute reward values. Used in recommendation systems where only relative preference is observable.

📖

용어

Noisy Feedback

Reward observations contaminated by random noise that degrades the quality of collected information. The noise may come from imperfect measurements or unpredictable user behaviors.

📖

용어

Censored Feedback

Situation where observed rewards are truncated at a certain maximum value, masking the true values beyond this threshold. Common in systems with technical or business constraints.

📖

용어

Truncated Feedback

Partial information where only knowledge of the rank or relative position of rewards is available, without their absolute values. Particularly used in ranking systems.

📖

용어

Exploration-Exploitation with Partial Feedback

Fundamental dilemma where the algorithm must balance discovery of new actions and exploitation of known best actions with incomplete information. Requires robust strategies facing increased uncertainty.

📖

용어

Contextual Bandits with Limited Feedback

Extension of bandits where actions depend on an observable context but with only partial information on rewards. Requires sophisticated estimation methods to manage contextual uncertainty.

📖

용어

Reward Distribution Estimation

Process of inferring the underlying reward distribution from partial or noisy observations. Fundamental for making optimal decisions under limited feedback.

📖

용어

Combinatorial Bandits with Partial Feedback

Problem where multiple actions are selected simultaneously but only aggregated information about their performance is available. Requires algorithms adapted to combinatorial complexity.

📖

용어

Linear Bandit with Noise

Model where rewards follow a linear combination of features but are observed with additive noise. Requires robust estimation techniques in the face of perturbations.

📖

용어

Adversarial Bandit with Limited Feedback

Setting where an adversary can manipulate rewards but the observer only accesses partial information about these manipulations. Demands robust adaptive strategies.

📖

용어

Aggregated Feedback

Cumulative information on the performance of a set of actions rather than on each individual action. Typical of systems with measurement or cost constraints.

📖

용어

Delayed Feedback

Situation where the reward of an action is only observed after a significant delay, creating temporal uncertainty. Complicates the attribution of rewards to appropriate actions.

📖

용어

Regret Bound with Partial Feedback

Theoretical analysis of the maximum achievable performance under limited information constraints. Provides guarantees on algorithm efficiency despite incomplete feedback.

AI 용어집