Actor-Critic Methods
Advantage Actor-Critic
Synchronous variant of A3C using advantage estimation to reduce policy gradient variance, with batch updates for better stability on GPU.
← ZurückSynchronous variant of A3C using advantage estimation to reduce policy gradient variance, with batch updates for better stability on GPU.
← Zurück