Multi-Agent Deep RL
COMA (Counterfactual Multi-Agent Policy Gradients)
Algorithm that uses counterfactual baselines to estimate how individual actions affect the global reward by changing one agent's policy while keeping others fixed.
← Wstecz