Learning through Model Differentiation
Policy Gradient Through Model
Method that calculates policy gradients by propagating rewards through a differentiable environment model.
← TerugMethod that calculates policy gradients by propagating rewards through a differentiable environment model.
← Terug