CNN Optimization and Regularization

📖

Begriffe

Weight Decay

L2 regularization adding a penalty proportional to the sum of squared weights in the loss function. Constrains weights toward zero to reduce model complexity and prevent overfitting.

📖

Begriffe

Early Stopping

Premature stopping of training when performance on the validation set stops improving. Prevents overfitting by monitoring validation loss and saving the best weights.

📖

Begriffe

L1 Regularization

Penalty added to the loss function equal to the sum of absolute values of weights. Promotes sparsity by pushing some weights exactly to zero, performing automatic feature selection.

📖

Begriffe

L2 Regularization

Quadratic penalty on weights that reduces the magnitude of weights without nullifying them completely. Stabilizes learning and improves generalization by limiting model complexity.

📖

Begriffe

Gradient Clipping

Limiting the gradient norm to prevent gradient explosion in deep networks. Maintains numerical stability by capping gradients to a predefined maximum value.

📖

Begriffe

Learning Rate Scheduling

Dynamic adjustment of the learning rate during training according to predefined strategies. Includes exponential decay, plateau reduction, and cosine to optimize convergence.

📖

Begriffe

Momentum Optimization

Optimization accelerator accumulating a velocity vector to overcome local minima and plateaus. Introduces inertia in gradient descent for more stable and faster convergence.

📖

Begriffe

Adam Optimizer

Adaptive optimization algorithm combining momentum and RMSprop for per-parameter learning rates. Automatically adjusts learning rates based on first and second order moments.

📖

Begriffe

Spatial Dropout

Variant of dropout that deactivates entire feature maps rather than individual neurons. Particularly effective for CNNs where correlated spatial features need to be regularized together.

📖

Begriffe

Stochastic Depth

Technique that randomly skips layers during training in very deep networks. Reduces overfitting and improves gradient propagation by varying the effective depth of the network.

📖

Begriffe

Label Smoothing

Regularization of target labels by distributing a small probability to incorrect classes. Prevents model overconfidence and improves prediction calibration by smoothing the target distribution.

📖

Begriffe

Regularization technique creating new samples through linear interpolation of images and their labels. Encourages the model to behave linearly between examples, improving robustness and generalization.

📖

Begriffe

CutMix

Data augmentation that cuts and pastes patches between images with proportional interpolation of labels. Effectively combines spatial features while preserving object localization.

📖

Begriffe

Elastic Net Regularization

Convex combination of L1 and L2 regularizations to benefit from the advantages of both approaches. Maintains the sparsity of L1 while preserving the numerical stability of L2.

📖

Begriffe

RMSprop

Adaptive optimizer that divides the gradient by a moving average of its squared magnitude. Particularly effective for non-stationary networks and complex optimization problems.

KI-Glossar

Weight Decay

Early Stopping

L1 Regularization

L2 Regularization

Gradient Clipping

Learning Rate Scheduling

Momentum Optimization

Adam Optimizer

Spatial Dropout

Stochastic Depth

Label Smoothing

Mixup

CutMix

Elastic Net Regularization

RMSprop

Keine Ergebnisse gefunden