Sparse Attention
BigBird
Model implementing sparse attention through three patterns: local, global, and random attention, allowing processing of sequences up to 4096 tokens with theoretical preservation of universal properties.
← Geri