KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Self-Attention
Fundamental mechanism allowing transformers to dynamically compute the relative importance of each element in a sequence compared to others.
Multi-Head Attention
Extension of self-attention where multiple attention heads operate in parallel to capture different types of relationships in the data.
Positional Encoding
Technique that incorporates sequential position information into embeddings to compensate for the absence of recurrence in transformers.
Encoder-Decoder Architecture
Fundamental structure of original transformers combining an encoder to process input and a decoder to generate output.
BERT (Bidirectional Encoder Representations)
Family of pre-trained models based on the encoder-only architecture with bidirectional context understanding.
GPT (Generative Pre-trained Transformer)
Decoder-only architecture optimized for autoregressive text generation, forming the basis of large language models.
Vision Transformers (ViT)
Application of transformer architectures to image processing by dividing images into patches and treating them as sequences.
Sparse Attention Mechanisms
Variants of attention reducing computational complexity by limiting connections between sequence elements.
Cross-Attention
Attention mechanism where queries come from one sequence while keys and values come from a different sequence.
Transformer Scaling Laws
Empirical principles describing how transformer performance evolves with model size, data, and computation.
Attention Head Analysis
Study of the specialized roles of different attention heads in transformers to understand their internal functioning.
Hierarchical Attention
Hierarchical attention architecture organized across multiple levels to process complex structured data.