Offline-to-Online Transfer Learning
Distributional Correction
Technique correcting the mismatch between the distribution of offline visited state-actions and that generated by the learned policy during online transfer.
← Indietro