Learning through Model Differentiation
Policy Gradient Through Model
Method that calculates policy gradients by propagating rewards through a differentiable environment model.
← IndietroMethod that calculates policy gradients by propagating rewards through a differentiable environment model.
← Indietro