Glosarium AI
Kamus lengkap Kecerdasan Buatan
Post-LN Transformer
Original transformer architecture where layer normalization is applied after the attention and feed-forward layers, requiring more precise learning rate tuning.
Gamma and Beta
Learnable parameters of layer normalization allowing respectively to scale and shift the normalized values to preserve the network's representational power.
Zero Centering
Process of subtracting the mean of activations in layer normalization to center data around zero, facilitating gradient optimization.
Unit Variance
Standardization of activations to have unit variance in layer normalization, ensuring numerical stability and constant gradients across layers.
Gradient Stability
Property of layer normalization that maintains stable gradients during backpropagation, avoiding exploding or vanishing gradient problems in deep transformers.
Epsilon Parameter
Small constant added to the denominator in layer normalization to prevent division by zero and ensure numerical stability when computing normalized variance.
Activation Distribution
Distribution of activation values in a layer that layer normalization maintains constant, facilitating convergence and optimization of transformer networks.
Scale Invariance
Property of layer normalization that makes the model insensitive to input scale changes, improving model robustness to data variations.
Training Speed
Significant acceleration of transformer training through layer normalization, enabling higher learning rates and faster convergence.
Hidden State Normalization
Application of layer normalization to transformer hidden states to maintain stable activations across different encoder and decoder layers.