KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Centralised Training with Decentralised Execution (CTDE)
Architectural paradigm in MARL where agents train using global and shared information, but execute their policies in a completely independent and decentralized manner. This approach combines the efficiency of centralized training with the robustness of distributed execution.
Value Decomposition Networks (VDN)
MARL architecture that decomposes the global team value into the sum of individual agent values, ensuring consistency between individual and collective policies. VDN maintains the simple additivity assumption to facilitate coordinated learning.
Q-MIX
Value decomposition algorithm that uses a nonlinear and monotonic mixing network to combine individual Q-values into team Q-value. Q-MIX allows complex interactions between agents while guaranteeing IGM (Individual-Global-Max) consistency.
Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
Extension of DDPG to multi-agent environments using centralized-decentralized learning with centralized critics and decentralized actors. Each agent learns a policy by considering other agents' policies as part of the environment.
Counterfactual Multi-Agent Policy Gradients (COMA)
Multi-agent policy gradient algorithm that uses a counterfactual to estimate the marginal advantage of each individual action by freezing other agents' actions. COMA solves the credit assignment problem in cooperative environments.
Decentralised Partially Observable MDP (Dec-POMDP)
Mathematical formalization of multi-agent decision problems with partial observability where each agent makes decisions based on its local observations. Agents must cooperate to maximize a shared global reward.
Credit Assignment Problem
Fundamental challenge in MARL consisting of correctly attributing team reward to individual contributions of each agent. Effective resolution is crucial for learning coordinated and optimal policies.
Attention Mechanisms in Multi-Agent Systems
Technique allowing agents to selectively weight relevant information from other agents or the environment. Attention improves communication and coordination by focusing on the most important interactions.
Communication Protocols
Structured mechanisms for information exchange between agents, which can be learned or predefined to optimize coordination. Effective protocols reduce communication overhead while maintaining critical task information.
Coordination Graphs
Graphical representation of dependencies between agents where nodes represent agents and edges represent necessary interactions. This structure enables efficient decomposition of multi-agent decision problems.
Team Q-learning
Variant of Q-learning where agents share a common value function and maximize collective team reward. Agents use local observations but optimize a shared global objective.
Multi-Agent Proximal Policy Optimization (MAPPO)
Extension of PPO to multi-agent environments using centralized critics to evaluate individual decentralized policies. MAPPO maintains PPO's training stability while handling multi-agent non-stationarity.
Individual-Global-Max (IGM) Principle
Theoretical principle ensuring that the optimal joint action for the team consists of the set of optimal individual actions according to decomposed values. IGM is essential for consistency between individual and collective learning.
Parameter Sharing
Technique where agents share the same neural network parameters to exploit similarities in tasks and reduce complexity. Parameter sharing facilitates learning and generalization among homogeneous agents.
Non-Stationarity Problem
Challenge in MARL where the environment perceived by each agent constantly changes as other agents adapt their policies. This problem requires specific algorithms like CTDE to maintain convergence.