Batch Constrained Q-learning (BCQ)
Value Function Estimation
Process of estimating Q-values from offline data while accounting for potential bias due to lack of exploration. Modern methods use conservative underestimation techniques to avoid over-optimization.
← 뒤로