KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
RMSprop
Adaptive optimization method that uses a weighted moving average of squared gradients to normalize the learning rate, thus preventing oscillations and accelerating convergence.
AdaGrad
Optimization algorithm that adapts the learning rate of each parameter based on the historical sum of squared gradients, allowing larger updates for infrequent parameters.
AdaDelta
Extension of AdaGrad that solves the problem of monotonically decreasing learning rates by using a sliding window of past gradients instead of the accumulated sum.
Weight Decay
Regularization method that penalizes large weights by adding an L2 term to the loss function, helping to prevent overfitting and improve generalization.
Beta Parameters (Adam)
Hyperparameters β1 and β2 that respectively control the exponential decay rates for the moving average of the gradient (first-order moment) and its variance (second-order moment).
Bias Correction
Mechanism in Adam that corrects the initial bias of moment estimates towards zero, ensuring unbiased estimates particularly important in the early steps of training.
Exponential Moving Average (EMA)
Smoothing technique that assigns more weight to recent observations, used in adaptive optimizers to estimate gradient moments.
YOGI
Variant of Adam that uses an adaptive variance controller to stabilize training, particularly effective when data has non-stationary distributions or noisy gradients.
Cyclical Learning Rates
Stratégie qui fait varier le taux d'apprentissage de manière cyclique entre des bornes minimales et maximales, permettant au modèle d'échapper aux minima locaux et d'explorer différents bassins d'attraction.