Model-based Meta-Learning
R2D2 (Recursive Reward Decomposition)
Meta-reinforcement learning method using a hierarchical decomposition of rewards to learn reusable policies across different tasks.
← Back