Efficient Transformers
Axial Attention
Decomposition of multidimensional attention into unidimensional attentions applied sequentially on each axis. Axial attention reduces the complexity from O(n²) to O(n*d) where d is the number of dimensions.
← Back