Policy Gradient Methods
Importance Sampling
A technique that allows using data collected with an old policy to update a new policy, by weighting samples according to the probability ratio of the policies.
← Indietro