Active Reinforcement Learning
State-Action-Value
Q(s,a) function estimating the expected cumulative reward when selecting action a from state s and following the optimal policy.
← 뒤로Q(s,a) function estimating the expected cumulative reward when selecting action a from state s and following the optimal policy.
← 뒤로