Batch Constrained Q-learning (BCQ)
Implicit Q-learning
Method that learns the Q function implicitly by avoiding direct evaluation of out-of-distribution actions. IQL formulates learning as an expectile learning problem to better handle uncertainty in offline data.
← Kembali