Vision Transformers (ViT)
Pre-training on Large Datasets
Initial training phase on millions of images like ImageNet-21k to learn general visual representations before fine-tuning.
← BackInitial training phase on millions of images like ImageNet-21k to learn general visual representations before fine-tuning.
← Back