Policy Gradient Methods
Monte Carlo Policy Gradient
A gradient estimation technique that uses complete trajectories to calculate returns, providing an unbiased but high-variance estimate.
← Zurück