YZ Sözlüğü
Yapay Zekanın tam sözlüğü
Attention Mechanism
Allows the model to weigh the importance of different parts of the input during processing.
Self-Attention
Mechanism where each element of the sequence attends to all other elements of the same sequence.
Multi-Head Attention
Extension of self-attention using multiple attention heads in parallel to capture different types of relationships.
Positional Encoding
Technique to incorporate position information in embeddings without using an RNN.
Encoder-Decoder Architecture
Fundamental structure of Transformers with encoder for understanding and decoder for generation.
Scaled Dot-Product Attention
Basic mathematical form of attention calculation in Transformers with scaling.
Feed-Forward Networks
Fully-connected networks applied after each attention layer in Transformers.
Layer Normalization
Normalization technique applied in Transformers to stabilize training.
Attention Masks
Mechanism to control which tokens can attend to other tokens.
Vision Transformers (ViT)
Application of Transformer architecture to image processing by dividing images into patches.
BERT Architecture
Transformer encoder-only pre-trained with masked language modeling objectives.
GPT Architecture
Transformer decoder-only optimized for auto-regressive text generation.
Cross-Attention
Attention mechanism between two different sequences in encoder-decoder models.
Sparse Attention
Variant of attention that reduces complexity by computing only selective pairs.
Hierarchical Attention
Multi-level architecture applying attention at different granularity scales.
Attention Visualization
Techniques to interpret and visualize attention weights in Transformers.
Transformer Optimization
Specific methods for effective training of large Transformer models.
Multi-Modal Transformers
Extended Transformer architecture to process multiple types of data simultaneously.
Efficient Transformers
Optimized variants of Transformers to reduce computational complexity.
Attention Mechanisms Variants
Different approaches and improvements to the attention mechanism beyond dot-product.