Sparse Attention Mechanisms
Block Sparse Attention
Sparse attention approach organized in blocks, where the attention matrix is divided into blocks and only certain blocks are computed to optimize hardware parallelism.
← Indietro