AI 용어집
인공지능 완전 사전
Binary Feedback
Type of feedback where only a positive/negative indication is observed after each action, without information on the reward magnitude. This format limits the information available to the learning algorithm.
Pairwise Feedback
Comparative information between two actions where only the winner is revealed, masking the absolute reward values. Used in recommendation systems where only relative preference is observable.
Noisy Feedback
Reward observations contaminated by random noise that degrades the quality of collected information. The noise may come from imperfect measurements or unpredictable user behaviors.
Censored Feedback
Situation where observed rewards are truncated at a certain maximum value, masking the true values beyond this threshold. Common in systems with technical or business constraints.
Truncated Feedback
Partial information where only knowledge of the rank or relative position of rewards is available, without their absolute values. Particularly used in ranking systems.
Exploration-Exploitation with Partial Feedback
Fundamental dilemma where the algorithm must balance discovery of new actions and exploitation of known best actions with incomplete information. Requires robust strategies facing increased uncertainty.
Contextual Bandits with Limited Feedback
Extension of bandits where actions depend on an observable context but with only partial information on rewards. Requires sophisticated estimation methods to manage contextual uncertainty.
Reward Distribution Estimation
Process of inferring the underlying reward distribution from partial or noisy observations. Fundamental for making optimal decisions under limited feedback.
Combinatorial Bandits with Partial Feedback
Problem where multiple actions are selected simultaneously but only aggregated information about their performance is available. Requires algorithms adapted to combinatorial complexity.
Linear Bandit with Noise
Model where rewards follow a linear combination of features but are observed with additive noise. Requires robust estimation techniques in the face of perturbations.
Adversarial Bandit with Limited Feedback
Setting where an adversary can manipulate rewards but the observer only accesses partial information about these manipulations. Demands robust adaptive strategies.
Aggregated Feedback
Cumulative information on the performance of a set of actions rather than on each individual action. Typical of systems with measurement or cost constraints.
Delayed Feedback
Situation where the reward of an action is only observed after a significant delay, creating temporal uncertainty. Complicates the attribution of rewards to appropriate actions.
Regret Bound with Partial Feedback
Theoretical analysis of the maximum achievable performance under limited information constraints. Provides guarantees on algorithm efficiency despite incomplete feedback.