Offline Multi-Task Reinforcement Learning
Conservative Multi-Task Policy Optimization
Method ensuring that multi-task policies do not deviate significantly from the behavior observed in the batch dataset to avoid out-of-support distributions.
← Geri