AI-ordlista
Den kompletta ordlistan över AI
Multi-Objective Q-Learning
Extension of traditional Q-Learning algorithm that handles reward vectors instead of scalar values, enabling simultaneous optimization of multiple conflicting objectives.
Q-value Vector
Multi-dimensional data structure where each element represents the Q-value for a specific objective, replacing the single scalar value of classical Q-Learning.
Lexicographic Approach
Multi-objective resolution strategy where objectives are ordered by priority and optimized sequentially, each objective only being considered after complete optimization of higher priority objectives.
Multi-objective Trade-off
Necessary balance between improving certain objectives and potential degradation of others, inherent to optimization problems with conflicting objectives.
Weighted Q-value
Linear combination of individual Q-values from each objective using specific weights to reflect the relative importance of each objective in the final decision.
Pareto Q-Learning Algorithm
Variant of Q-Learning that maintains a set of Pareto-optimal policies and simultaneously learns Q-values for all possible trade-offs between objectives.
Multi-objective Exploration
Exploration strategy adapted to multi-objective environments that must balance the discovery of trade-offs between different objectives while maintaining learning efficiency.
Nash Equilibrium in Q-Learning
Game theory concept applied to multi-objective Q-Learning where no policy can unilaterally improve its performance on one objective without degrading its performance on another.
Objective Decomposition
Technique transforming a multi-objective problem into several single-objective subproblems optimized simultaneously, facilitating the discovery of diverse solutions on the Pareto front.
Reward Vector
Multidimensional reward vector where each component corresponds to the reward associated with a specific objective, replacing the traditional scalar reward signal.
Policy Space Adaptation
Dynamic adaptation mechanism of the policy space to efficiently manage the additional complexity introduced by the multi-objective nature of the learning problem.