Deep Optimization
AdaDelta
Extension of AdaGrad that limits the accumulation window of past gradients to a fixed size via a moving average, avoiding the aggressive decay of the learning rate.
← Zurück