Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
Centralized Q-learning
A variant of Q-learning where agents share a common Q-table to coordinate their actions in a cooperative environment. This approach allows learning an optimal joint policy by considering the global state of the system.
Value Decomposition Networks (VDN)
A neural network architecture that decomposes the team value into the sum of individual values of each agent. This method maintains agent individuality while maximizing collective reward.
QMIX
A multi-agent reinforcement learning algorithm that combines monotonic mixing networks with value decomposition. QMIX ensures consistency between individual values and the global value while allowing arbitrary complexity in the decomposition.
Counterfactual Regret Minimization (CRM)
An optimization technique that minimizes counterfactual regret for each agent in cooperative games. It allows learning optimal team strategies by evaluating what the outcome would have been if an agent had acted differently.
Commutative Monotonicity
A mathematical property essential in value decomposition algorithms where the order of agents does not affect the total value. This condition ensures that the collective reward is consistent regardless of agent permutation.
Individual-Global-Max (IGM)
A fundamental principle stating that the global maximum of the team value function must be achieved when each agent chooses its maximum individual action. This property ensures consistency between local decisions and global optimality.
Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
An extension of DDPG to multi-agent environments using a centralized actor-critic approach during training. MADDPG allows agents to learn decentralized policies while having access to complete information during training.
Centralized Training with Decentralized Execution (CTDE)
A learning paradigm where agents train with access to global information but execute decentralized policies during deployment. This approach combines the advantages of centralized coordination during learning with the robustness of distributed execution.
Attention-based Communication
Communication mechanism between agents where each agent learns to selectively pay attention to relevant messages from other agents. This approach optimizes information flow and reduces computational complexity in agent teams.
Mean Field Reinforcement Learning
Theoretical approach that models the behavior of a large number of agents as a mean field rather than individual interactions. This method allows scaling multi-agent learning to very large populations while capturing collective emergences.
Team-Q
Q-learning algorithm extended to team environments where the Q function is defined on the joint actions of all agents. Team-Q allows learning optimal coordinated strategies in problems with discrete states.
Distributed Q-learning
Variant of Q-learning where each agent maintains its own Q table but periodically shares information with other agents. This approach combines local autonomy with collective learning to achieve effective coordination.
Decentralized Partially Observable Markov Decision Process (Dec-POMDP)
Mathematical formalism for modeling multi-agent decision problems with partial and decentralized observation. Dec-POMDPs capture the complexity of cooperative environments where each agent has only a limited view of the global state.
Cooperative Inverse Reinforcement Learning
Extension of inverse reinforcement learning where multiple agents collaborate to infer the common reward function from examples. This approach allows agents to collectively learn which behavior maximizes common satisfaction.
Shared Experience Replay
Technique where agents share a common experience buffer to improve learning efficiency. This method allows agents to learn faster by benefiting from the experiences of other team members.
Multi-Agent Actor-Critic
Learning architecture combining decentralized actors with centralized critics for cooperative multi-agent environments. This approach allows actors to make local decisions based on global evaluations provided by critics.