AI Glossary
The complete dictionary of Artificial Intelligence
Actor-Critic
Reinforcement learning architecture combining an actor network that learns a stochastic policy and a critic network that estimates the value function to reduce the policy gradient variance.
Value Function
Mathematical function estimating the expected cumulative return from a state or state-action pair, serving as the learning signal for the critic in the Actor-Critic architecture.
Asynchronous Advantage Actor-Critic
Distributed architecture where multiple agents train in parallel on independent environments, periodically sharing their gradients to accelerate learning.
Deep Deterministic Policy Gradient
Actor-Critic algorithm for continuous action spaces using deep neural networks with deterministic policy and replay buffer for stable off-policy learning.
Twin Delayed Deep Deterministic Policy Gradient
Improvement over DDPG using twin critics to reduce value overestimation and delayed updates of the actor and targets for better stability.
Soft Actor-Critic
Actor-Critic algorithm maximizing an entropy-augmented reward combining return and entropy to encourage exploration, using stable and efficient off-policy updates.
Advantage Actor-Critic
Synchronous variant of A3C using advantage estimation to reduce policy gradient variance, with batch updates for better stability on GPU.
Critic Network
Neural network estimating the value function V(s) or Q(s,a) to provide the TD learning signal to the actor, using prediction error as optimization gradient.