Attention Mechanisms Variants

📖

pojęcia

Relative Position Encoding

Positional encoding technique based on relative distances between tokens rather than their absolute positions. Improves the model's generalization ability to sequence lengths not seen during training.

📖

pojęcia

Rotary Position Embedding (RoPE)

Positional encoding method applying matrix rotation to query and key embeddings based on their positions. Naturally integrates positional information into the attention mechanism without adding parameters.

📖

pojęcia

Linear Attention

Family of attention mechanisms with linear complexity O(n) using matrix decompositions or kernels to avoid explicit computation of the attention matrix. Enables processing very long sequences with increased computational efficiency.

📖

pojęcia

Longformer Attention

Hybrid architecture combining local attention via sliding window and global attention for certain tokens. Enables processing documents of several thousand tokens with linear complexity.

📖

pojęcia

BigBird Attention

Sparse attention mechanism combining three types of connections: random, local, and global to approximate full attention. Theoretically proven as a universal approximator for complete graphs with linear complexity.

📖

pojęcia

Reformer Attention

Efficient implementation using LSH (Locality Sensitive Hashing) to limit attention to similar tokens only. Drastically reduces complexity while preserving important semantic relationships.

📖

pojęcia

Linformer Attention

Low-dimensional projection of key and value matrices to reduce complexity from O(n²) to O(n). Based on the hypothesis that attention matrices have low rank in many practical scenarios.

📖

pojęcia

Kernel Attention

Approach replacing softmax with positive kernel functions to achieve linear complexity. Enables efficient approximations while preserving the mathematical properties of attention.

📖

pojęcia

Adaptive Attention Span

Mécanisme où chaque tête d'attention apprend dynamiquement sa portée optimale pendant l'entraînement. Optimise l'utilisation computationnelle en concentrant l'attention là où elle est nécessaire selon les patterns appris.

Słownik AI

Relative Position Encoding

Rotary Position Embedding (RoPE)

Linear Attention

Longformer Attention

BigBird Attention

Reformer Attention

Linformer Attention

Kernel Attention

Adaptive Attention Span

Nie znaleziono wyników