Sparse Attention
Dilated Sliding Window
Variant of sliding window attention using jumps (dilation) to increase the receptive field without increasing computational complexity.
← TerugVariant of sliding window attention using jumps (dilation) to increase the receptive field without increasing computational complexity.
← Terug