Deep Deterministic Policy Gradient (DDPG)
Off-Policy Learning
Learning method where the agent learns an optimal policy while following another behavior policy, allowing for better exploration.
← 뒤로