Policy Gradient Methods
Policy Network
Parameterized neural network that represents the policy π(a|s; θ), generating a probability distribution over actions conditioned on the current state.
← Indietro