Policy Gradient Methods
KL Divergence Constraint
A constraint that limits the Kullback-Leibler divergence between successive policies to ensure stable updates and avoid overly drastic changes in behavior.
← Wstecz