Dyna-Q Learning
Algorithm convergence
Property guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.
← ZurückProperty guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.
← Zurück