Implicit Q-Learning (IQL)
Implicit Max Operator
Mathematical technique in IQL that avoids direct calculation of the maximum over actions by using conservative upper bounds based on the behavior distribution.
← Zurück