Momentum-based Optimization

📖

termer

RMSprop

Adaptive optimization technique that divides the learning rate by an exponential moving average of the squares of recent gradients to handle large-magnitude gradients.

📖

termer

Adagrad

Adaptive optimization algorithm that adapts the learning rate of each parameter by accumulating the squares of historical gradients, favoring infrequent parameters.

📖

termer

Adadelta

Extension of Adagrad that solves the problem of the learning rate's drastic decay by limiting the window of past gradients to a fixed size via an exponential moving average.

📖

termer

Adamax

Variant of Adam based on the infinity norm instead of the L2 norm, offering greater numerical stability and more robust convergence in some scenarios.

📖

termer

Nadam

Combination of Nesterov accelerated gradient and Adam that incorporates Nesterov's acceleration into Adam's adaptive framework for faster and more stable convergence.

📖

termer

AMSGrad

Modification of Adam that guarantees theoretical convergence by retaining the maximum of the exponential moving averages of the squared gradients to avoid Adam's potential divergences.

📖

termer

AdamW

Variant of Adam that decouples weight decay from the adaptive update, applying decay directly to the weights rather than to the gradients.

📖

termer

SGDW

Extension of SGD with decoupled weight decay that applies weight decay independently of the gradient update for better regularization.

📖

termer

RAdam

Rectified Adam that solves the problem of high variance in the initial training phases by introducing an adaptive rectification mechanism.

📖

termer

YellowFin

Optimizer that automatically adjusts the learning rate and momentum coefficient using a theoretical analysis of the local convergence of second-order methods.

📖

termer

LARS

Layer-wise Adaptive Rate Scaling that adapts the learning rate per layer based on the ratio between the L2 norm of weights and gradients for large-scale training.

📖

termer

LAMB

Layer-wise Adaptive Moments optimizer for Batch training that extends LARS by integrating Adam-type adaptive statistics for efficient training of massive models.

📖

termer

Rprop

Resilient Backpropagation that adapts the learning rate per parameter by ignoring the magnitude of the gradient and considering only its sign for robust updates.

📖

termer

QHAdam

Quasi-Hyperbolic Adam that generalizes Adam and Momentum by introducing quasi-hyperbolicity parameters for fine control of the moment contributions.

AI-ordlista

RMSprop

Adagrad

Adadelta

Adamax

Nadam

AMSGrad

AdamW

SGDW

RAdam

YellowFin

LARS

LAMB

Rprop

QHAdam

Inga resultat hittades