Sparse Attention
Longformer
Transformer architecture using a combination of local sliding window attention and global attention to efficiently process very long sequences with linear complexity.
← Back