Dyna-Q Learning
Experience replay buffer
Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.
← WsteczData structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.
← Wstecz