Vision Transformers (ViT)
Patch Merging
Operation in hierarchical transformers that combines groups of 2x2 adjacent patches to create lower-resolution tokens, thereby increasing depth and receptive field.
← Terug