Batch Constrained Q-learning (BCQ)
Bootstrapping Error
Error accumulated when a policy uses its own value estimates to improve itself, leading to divergence from the data support. Offline methods use specific techniques to control this bias.
← Tillbaka