KI-Glossar

Das vollständige Wörterbuch der Künstlichen Intelligenz

162

Kategorien

2.032

Unterkategorien

23.060

Begriffe

📖

Begriffe

Post-LN Transformer

Original transformer architecture where layer normalization is applied after the attention and feed-forward layers, requiring more precise learning rate tuning.

📖

Begriffe

Gamma and Beta

Learnable parameters of layer normalization allowing respectively to scale and shift the normalized values to preserve the network's representational power.

📖

Begriffe

Zero Centering

Process of subtracting the mean of activations in layer normalization to center data around zero, facilitating gradient optimization.

📖

Begriffe

Unit Variance

Standardization of activations to have unit variance in layer normalization, ensuring numerical stability and constant gradients across layers.

📖

Begriffe

Gradient Stability

Property of layer normalization that maintains stable gradients during backpropagation, avoiding exploding or vanishing gradient problems in deep transformers.

📖

Begriffe

Epsilon Parameter

Small constant added to the denominator in layer normalization to prevent division by zero and ensure numerical stability when computing normalized variance.

📖

Begriffe

Activation Distribution

Distribution of activation values in a layer that layer normalization maintains constant, facilitating convergence and optimization of transformer networks.

📖

Begriffe