Detection with Transformer Architectures
Vision Transformer (ViT) Backbone
Use of pre-trained ViTs as feature extractors for transformer detectors, offering powerful and contextual image representation.
← Geri