Transformers for detection
Transformer Encoder-Decoder
Structure where the encoder processes image features to create a context-rich representation, and the decoder uses object queries to decode this representation into final box and class predictions.
← Tillbaka