Dyna-Q Learning
Experience replay buffer
Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.
← 뒤로Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.
← 뒤로