Batch Constrained Q-learning (BCQ)
Policy Evaluation
Phase of evaluating policy performance using only offline data without interaction with the environment. This step is crucial for validating learning before deployment.
← 뒤로