Conservative Q-Learning (CQL)
Conservative Q-Learning (CQL)
Offline reinforcement learning method that actively penalizes overestimated Q-values to keep the policy close to the behavioral data distribution and prevent divergence.
← Geri