Policy Gradient Methods
Entropy Regularization
Adding an entropy term to the objective function to encourage exploration by penalizing overly deterministic policies, improving the robustness of learning.
← Indietro