KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Weight Decay
L2 regularization adding a penalty proportional to the sum of squared weights in the loss function. Constrains weights toward zero to reduce model complexity and prevent overfitting.
Early Stopping
Premature stopping of training when performance on the validation set stops improving. Prevents overfitting by monitoring validation loss and saving the best weights.
L1 Regularization
Penalty added to the loss function equal to the sum of absolute values of weights. Promotes sparsity by pushing some weights exactly to zero, performing automatic feature selection.
L2 Regularization
Quadratic penalty on weights that reduces the magnitude of weights without nullifying them completely. Stabilizes learning and improves generalization by limiting model complexity.
Gradient Clipping
Limiting the gradient norm to prevent gradient explosion in deep networks. Maintains numerical stability by capping gradients to a predefined maximum value.
Learning Rate Scheduling
Dynamic adjustment of the learning rate during training according to predefined strategies. Includes exponential decay, plateau reduction, and cosine to optimize convergence.
Momentum Optimization
Optimization accelerator accumulating a velocity vector to overcome local minima and plateaus. Introduces inertia in gradient descent for more stable and faster convergence.
Adam Optimizer
Adaptive optimization algorithm combining momentum and RMSprop for per-parameter learning rates. Automatically adjusts learning rates based on first and second order moments.
Spatial Dropout
Variant of dropout that deactivates entire feature maps rather than individual neurons. Particularly effective for CNNs where correlated spatial features need to be regularized together.
Stochastic Depth
Technique that randomly skips layers during training in very deep networks. Reduces overfitting and improves gradient propagation by varying the effective depth of the network.
Label Smoothing
Regularization of target labels by distributing a small probability to incorrect classes. Prevents model overconfidence and improves prediction calibration by smoothing the target distribution.
Mixup
Regularization technique creating new samples through linear interpolation of images and their labels. Encourages the model to behave linearly between examples, improving robustness and generalization.
CutMix
Data augmentation that cuts and pastes patches between images with proportional interpolation of labels. Effectively combines spatial features while preserving object localization.
Elastic Net Regularization
Convex combination of L1 and L2 regularizations to benefit from the advantages of both approaches. Maintains the sparsity of L1 while preserving the numerical stability of L2.
RMSprop
Adaptive optimizer that divides the gradient by a moving average of its squared magnitude. Particularly effective for non-stationary networks and complex optimization problems.