Batch Constrained Q-learning (BCQ)
Batch RL
Reinforcement learning framework where the agent has a fixed batch of transitions and must learn an optimal policy without additional interactions. This context imposes specific constraints on algorithms to prevent divergence.
← Back