Learning through Model Differentiation
Policy Gradient Through Model
Method that calculates policy gradients by propagating rewards through a differentiable environment model.
← TillbakaMethod that calculates policy gradients by propagating rewards through a differentiable environment model.
← Tillbaka