Vision Transformers (ViT)
Multi-Head Self-Attention
Mechanism allowing the model to simultaneously compute multiple attention representations to capture different relationships between image patches.
← Indietro