Reinforcement Learning for Optimization
Policy Gradient Algorithm
Optimization method that directly adjusts policy parameters by following the gradient of the expected reward with respect to these parameters.
← Back