Vision Transformers (ViT)
Windowed Attention
Attention mechanism restricted to local non-overlapping windows of the image, reducing computational complexity from O(n²) to O(n) where n is the number of patches.
← 뒤로