Sparse Attention - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Longformer

Transformer architecture using a combination of local sliding window attention and global attention to efficiently process very long sequences with linear complexity.

📖

thuật ngữ

Model implementing sparse attention through three patterns: local, global, and random attention, allowing processing of sequences up to 4096 tokens with theoretical preservation of universal properties.

📖

thuật ngữ

Sliding Window Attention

Technique where each token only attends to a fixed number of neighbors in a sliding window, reducing complexity to O(n*w) where w is the window size.

📖

thuật ngữ

Dilated Sliding Window

Variant of sliding window attention using jumps (dilation) to increase the receptive field without increasing computational complexity.

📖

thuật ngữ

Global Attention

Mechanism where certain predefined tokens (like [CLS] tokens) can attract attention from all other tokens, allowing information propagation across the entire sequence.

📖

thuật ngữ

Random Attention

Approach where each token randomly attends to a subset of distant tokens, preserving long-distance connections with low computational overhead.

📖

thuật ngữ

Pattern-based Attention

Strategy applying predefined sparse attention patterns (like fixed or learned patterns) to determine which query-key pairs to compute.

📖

thuật ngữ

Linear Complexity Attention

Class of attention methods reducing algorithmic complexity from O(n²) to O(n), enabling scaling for very long sequences.

📖

thuật ngữ

Kernel-based Attention

Approach using kernels to approximate softmax attention, enabling linear complexity calculations through techniques like FAVOR+ (Fast Attention Via Positive Orthogonal Random Features).

📖

thuật ngữ

Low-rank Approximation

Technique approximating the attention matrix through low-rank decomposition, significantly reducing memory and computational requirements.

📖

thuật ngữ

Clustering-based Attention

Method that first groups tokens into similar clusters then applies attention at the cluster level, reducing the number of required computations.

📖

thuật ngữ

Routing Attention

Mechanism that learns to route queries to the most relevant keys using content-based routing functions, avoiding unnecessary computations.

📖

thuật ngữ

Reformer

Architecture using locality-sensitive hashing (LSH) to limit attention computations to the most similar pairs, with quasi-linear complexity in sequence length.

📖

thuật ngữ

Performer

Model based on FAVOR+ attention that efficiently approximates softmax attention through positive orthogonal random features, enabling linear complexity.

📖

thuật ngữ

Linformer

Architecture that projects the key-value matrix into a lower-dimensional space, transforming complexity from O(n²) to O(n*k) where k << n.

📖

thuật ngữ

Routing Transformer

Model using k-means based routing to group tokens and apply attention selectively, optimizing computations for long-distance dependencies.

📖

thuật ngữ

Sinkhorn Sorting

Algorithm using Sinkhorn iteration to transform attention into a differentiable permutation, applied in sparse attention architectures.

📖

thuật ngữ

Efficient Attention

Paradigm encompassing all attention variants aimed at reducing computational complexity while preserving the modeling capabilities of Transformers.

Thuật ngữ AI

Longformer

BigBird

Sliding Window Attention

Dilated Sliding Window

Global Attention

Random Attention

Pattern-based Attention

Linear Complexity Attention

Kernel-based Attention

Low-rank Approximation

Clustering-based Attention

Routing Attention

Reformer

Performer

Linformer

Routing Transformer

Sinkhorn Sorting

Efficient Attention

Không tìm thấy kết quả