Multi-Head Attention
Linearized Attention
Family of attention mechanisms that rewrite the attention calculation to avoid materializing the full attention matrix, allowing linear complexity relative to the sequence length.
← Indietro