Vision Transformers (ViT)
Shifted Window Attention
Technique where attention windows are shifted between layers to enable cross-window connections, thereby improving the model's ability to model long-range relationships.
← Indietro