Policy Gradient Methods
Return-to-Go
Sum of discounted future rewards from a given time step, used as a gradient estimator in policy gradient algorithms.
← WsteczSum of discounted future rewards from a given time step, used as a gradient estimator in policy gradient algorithms.
← Wstecz