Vision Transformers for Detection
Token-to-Token ViT
Variant introducing a progressive transition between tokens with resizing and recombination to preserve local structural information.
← BackVariant introducing a progressive transition between tokens with resizing and recombination to preserve local structural information.
← Back