Momentum-based Optimization
LARS
Layer-wise Adaptive Rate Scaling that adapts the learning rate per layer based on the ratio between the L2 norm of weights and gradients for large-scale training.
← Tillbaka