KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Feed-Forward Network
Fully connected neural network applied independently to each position after the attention mechanism in the Transformer architecture.
Layer Normalization
Normalization technique applied to each layer to stabilize training by normalizing activations across feature dimensions.
Residual Connection
Skip connections that preserve information across deep layers by adding the input of a layer to its output.
Causal Attention
Variant of masked attention where each token can only attend to previous tokens, essential for auto-regressive generative models.
Token Embedding
Dense vector representation of each token in a high-dimensional space, serving as the initial input to the Transformer architecture.
Transformer-XL
Extension of the Transformer architecture introducing a segment-level recurrence mechanism to capture longer-term dependencies.
Attention Score
Numerical value calculated before normalization representing the compatibility between a query and each key in the attention mechanism.
Position Embedding
Alternative to sinusoidal positional encoding using learned embeddings to represent absolute positions in the sequence.