Model-Based Offline RL
Conservative Policy Optimization
Algorithm that explicitly penalizes policies that significantly deviate from the training data behavior to avoid extrapolation errors.
← Terug