Efficient Transformers
Dilated Attention
Extension of sliding window attention using dilated patterns to capture longer-range dependencies without increasing complexity. The gaps in the pattern allow exponential expansion of the receptive field.
← Geri