Efficient Transformers
Linear Transformer
Architecture using kernelized decomposition of attention to achieve linear complexity in sequence and memory. Linear Transformer replaces softmax with positive kernel functions to enable associative reordering.
← Tillbaka