Policy Gradient Methods
Stochastic Policy
A policy represented by a probability distribution π(a|s) over actions, allowing for intrinsic exploration and is essential for policy gradient methods.
← Wstecz