Sparse Attention
Routing Transformer
Model using k-means based routing to group tokens and apply attention selectively, optimizing computations for long-distance dependencies.
← TerugModel using k-means based routing to group tokens and apply attention selectively, optimizing computations for long-distance dependencies.
← Terug