Model-Based Deep RL
Model-Based Policy Optimization (MBPO)
Hybrid algorithm using short-range models to generate synthetic data while maintaining a set of real data to stabilize policy learning.
← Indietro