Dyna-Q Learning
Planning function
Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.
← WsteczAlgorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.
← Wstecz