Cross-Attention
Sparse Cross-Attention
Optimization of cross-attention limiting attentional connections to predefined or learned subsets of relevant positions. Reduces computational complexity from O(n²) to O(n log n) or O(n) for long sequences.
← Geri