Centralized-Decentralized MARL

📖

termen

Centralised Training with Decentralised Execution (CTDE)

Architectural paradigm in MARL where agents train using global and shared information, but execute their policies in a completely independent and decentralized manner. This approach combines the efficiency of centralized training with the robustness of distributed execution.

📖

termen

Value Decomposition Networks (VDN)

MARL architecture that decomposes the global team value into the sum of individual agent values, ensuring consistency between individual and collective policies. VDN maintains the simple additivity assumption to facilitate coordinated learning.

📖

termen

Q-MIX

Value decomposition algorithm that uses a nonlinear and monotonic mixing network to combine individual Q-values into team Q-value. Q-MIX allows complex interactions between agents while guaranteeing IGM (Individual-Global-Max) consistency.

📖

termen

Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

Extension of DDPG to multi-agent environments using centralized-decentralized learning with centralized critics and decentralized actors. Each agent learns a policy by considering other agents' policies as part of the environment.

📖

termen

Counterfactual Multi-Agent Policy Gradients (COMA)

Multi-agent policy gradient algorithm that uses a counterfactual to estimate the marginal advantage of each individual action by freezing other agents' actions. COMA solves the credit assignment problem in cooperative environments.

📖

termen

Decentralised Partially Observable MDP (Dec-POMDP)

Mathematical formalization of multi-agent decision problems with partial observability where each agent makes decisions based on its local observations. Agents must cooperate to maximize a shared global reward.

📖

termen

Credit Assignment Problem

Fundamental challenge in MARL consisting of correctly attributing team reward to individual contributions of each agent. Effective resolution is crucial for learning coordinated and optimal policies.

📖

termen

Attention Mechanisms in Multi-Agent Systems

Technique allowing agents to selectively weight relevant information from other agents or the environment. Attention improves communication and coordination by focusing on the most important interactions.

📖

termen

Communication Protocols

Structured mechanisms for information exchange between agents, which can be learned or predefined to optimize coordination. Effective protocols reduce communication overhead while maintaining critical task information.

📖

termen

Coordination Graphs

Graphical representation of dependencies between agents where nodes represent agents and edges represent necessary interactions. This structure enables efficient decomposition of multi-agent decision problems.

📖

termen

Team Q-learning

Variant of Q-learning where agents share a common value function and maximize collective team reward. Agents use local observations but optimize a shared global objective.

📖

termen

Multi-Agent Proximal Policy Optimization (MAPPO)

Extension of PPO to multi-agent environments using centralized critics to evaluate individual decentralized policies. MAPPO maintains PPO's training stability while handling multi-agent non-stationarity.

📖

termen

Individual-Global-Max (IGM) Principle

Theoretical principle ensuring that the optimal joint action for the team consists of the set of optimal individual actions according to decomposed values. IGM is essential for consistency between individual and collective learning.

📖

termen

Parameter Sharing

Technique where agents share the same neural network parameters to exploit similarities in tasks and reduce complexity. Parameter sharing facilitates learning and generalization among homogeneous agents.

📖

termen

Non-Stationarity Problem

Challenge in MARL where the environment perceived by each agent constantly changes as other agents adapt their policies. This problem requires specific algorithms like CTDE to maintain convergence.

AI-woordenlijst

Centralised Training with Decentralised Execution (CTDE)

Value Decomposition Networks (VDN)

Q-MIX

Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

Counterfactual Multi-Agent Policy Gradients (COMA)

Decentralised Partially Observable MDP (Dec-POMDP)

Credit Assignment Problem

Attention Mechanisms in Multi-Agent Systems

Communication Protocols

Coordination Graphs

Team Q-learning

Multi-Agent Proximal Policy Optimization (MAPPO)

Individual-Global-Max (IGM) Principle

Parameter Sharing

Non-Stationarity Problem

Geen resultaten gevonden