Policy Gradient Methods
Policy Gradient
Direct optimization method that adjusts policy parameters by following the gradient of the expected return, enabling learning of stochastic policies without requiring an environment model.
← Kembali