Sparse Attention
Sliding Window Attention
Technique where each token only attends to a fixed number of neighbors in a sliding window, reducing complexity to O(n*w) where w is the window size.
← Zurück