Sparse Attention
Global Attention
Mechanism where certain predefined tokens (like [CLS] tokens) can attract attention from all other tokens, allowing information propagation across the entire sequence.
← Zurück