Efficient Transformers
Sparse Transformer
Variant using predictive sparse attention patterns to reduce computational connections while capturing long-range dependencies. The architecture factorizes attention into subsets to optimize processing.
← Back