Batch Constrained Q-learning (BCQ)
Distribution Shift
Phenomenon where the distribution of state-actions visited by the learned policy significantly differs from the distribution of the offline dataset. This shift can lead to biased value estimates and degraded performance during deployment.
← Quay lại