Sparse Attention
Random Attention
Approach where each token randomly attends to a subset of distant tokens, preserving long-distance connections with low computational overhead.
← WsteczApproach where each token randomly attends to a subset of distant tokens, preserving long-distance connections with low computational overhead.
← Wstecz