Sparse Attention
Kernel-based Attention
Approach using kernels to approximate softmax attention, enabling linear complexity calculations through techniques like FAVOR+ (Fast Attention Via Positive Orthogonal Random Features).
← Wstecz