Dyna-Q Learning
Planning function
Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.
← ZurückAlgorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.
← Zurück