Conservative Q-Learning (CQL)
Distribution correction
Mechanism in CQL that adjusts Q-estimations to correct the mismatch between the behavioral distribution and the target policy distribution.
← Kembali