Batch Constrained Q-learning (BCQ)
Model-Based RL
Approach that learns a model of the environment dynamics from offline data to generate synthetic experiences. In an offline context, this model must be used cautiously to avoid error propagation.
← Kembali