Sparse Attention
Random Attention
Approach where each token randomly attends to a subset of distant tokens, preserving long-distance connections with low computational overhead.
← ZurückApproach where each token randomly attends to a subset of distant tokens, preserving long-distance connections with low computational overhead.
← Zurück