Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
RMSprop
Adaptive optimization technique that divides the learning rate by an exponential moving average of the squares of recent gradients to handle large-magnitude gradients.
Adagrad
Adaptive optimization algorithm that adapts the learning rate of each parameter by accumulating the squares of historical gradients, favoring infrequent parameters.
Adadelta
Extension of Adagrad that solves the problem of the learning rate's drastic decay by limiting the window of past gradients to a fixed size via an exponential moving average.
Adamax
Variant of Adam based on the infinity norm instead of the L2 norm, offering greater numerical stability and more robust convergence in some scenarios.
Nadam
Combination of Nesterov accelerated gradient and Adam that incorporates Nesterov's acceleration into Adam's adaptive framework for faster and more stable convergence.
AMSGrad
Modification of Adam that guarantees theoretical convergence by retaining the maximum of the exponential moving averages of the squared gradients to avoid Adam's potential divergences.
AdamW
Variant of Adam that decouples weight decay from the adaptive update, applying decay directly to the weights rather than to the gradients.
SGDW
Extension of SGD with decoupled weight decay that applies weight decay independently of the gradient update for better regularization.
RAdam
Rectified Adam that solves the problem of high variance in the initial training phases by introducing an adaptive rectification mechanism.
YellowFin
Optimizer that automatically adjusts the learning rate and momentum coefficient using a theoretical analysis of the local convergence of second-order methods.
LARS
Layer-wise Adaptive Rate Scaling that adapts the learning rate per layer based on the ratio between the L2 norm of weights and gradients for large-scale training.
LAMB
Layer-wise Adaptive Moments optimizer for Batch training that extends LARS by integrating Adam-type adaptive statistics for efficient training of massive models.
Rprop
Resilient Backpropagation that adapts the learning rate per parameter by ignoring the magnitude of the gradient and considering only its sign for robust updates.
QHAdam
Quasi-Hyperbolic Adam that generalizes Adam and Momentum by introducing quasi-hyperbolicity parameters for fine control of the moment contributions.