Vision Transformers (ViT)
Token Labeling
Training strategy where each patch receives a supervised label instead of a single label per image, forcing the model to learn richer and more localized representations.
← Indietro