Vision Transformers (ViT)
Masked Autoencoder (MAE)
Self-supervised approach for ViT where random patches of the image are masked (up to 75%) and the model learns to reconstruct them, revealing surprising learning capabilities.
← Zurück