Vision Transformers (ViT)
Hybrid Architecture
Approach combining an initial convolutional network for feature extraction with a transformer for global processing, used in early ViT implementations to reduce data requirements.
← Zurück