Transformers for detection

📖

terms

DETR (DEtection TRansformer)

Pioneering architecture that eliminates the need for anchors and non-maximum suppression by treating object detection as a direct set prediction problem, using a bipartite transformer to model relationships between objects.

📖

terms

Bipartite Transformer

Variant of the Transformer architecture where attention mechanisms are applied between image features and a small fixed set of learnable object queries, enabling parallel object prediction.

📖

terms

Object Queries

Learnable positional embedding vectors that serve as slots for each potential object prediction, interacting with image features through the attention mechanism to extract relevant information.

📖

terms

Bipartite Matching Loss

Loss function based on the Hungarian algorithm that finds an optimal one-to-one matching between model predictions and ground truths, solving the permutation problem of unsupervised predictions.

📖

terms

Transformer Encoder-Decoder

Structure where the encoder processes image features to create a context-rich representation, and the decoder uses object queries to decode this representation into final box and class predictions.

📖

terms

Multi-Scale Multi-head Attention (MSA)

Attention mechanism that operates on fused features from multiple levels of the feature map, allowing the model to simultaneously capture local and global information for better detection of objects of various sizes.

📖

terms

DETR-ResNet

Variant of DETR that uses a ResNet convolutional neural network as the main feature extractor, combining the power of CNNs for feature extraction with the global reasoning of Transformers.

📖

terms

Mask2Former

Unified architecture for panoptic, instance, and semantic segmentation that masks regions of interest and directly predicts masks using transformers, outperforming previous approaches in terms of accuracy and simplicity.

📖

terms

Positional Embeddings

Vectors added to image features to provide spatial information to the Transformer, essential for the model to understand scene geometry and correctly locate objects.

📖

terms

Conditional DETR

Improvement of DETR that accelerates convergence by conditioning object queries on image content, allowing better query specialization and more accurate predictions.

📖

terms

Deformable DETR

Variant of DETR that integrates deformable attention modules to focus on a small set of key points, significantly improving convergence speed and performance, especially for small objects.

📖

terms

Sparse R-CNN

Fully sparse detection approach that uses a fixed set of learnable proposed boxes and a cascade of transformers to refine predictions, eliminating the need for heuristics like anchors or NMS.

📖

terms

Query-to-Attention

Mechanism where object queries guide the model's attention to relevant regions of the image, unlike global attention, improving prediction efficiency and specialization.

📖

terms

DINO (DETR with Improved deNoising Anchor Boxes)

State-of-the-art model that combines improved denoising anchor boxes with a Transformer architecture, achieving state-of-the-art performance on detection benchmarks without requiring NMS.

📖

terms

Focal Loss for Transformers

Loss function designed to address the slow convergence problem of DETR models by focusing on hard samples and reducing the contribution of well-classified easy samples.

📖

terms

Panoptic Segmentation by Transformer

Application of Transformer architectures to the unified task of panoptic segmentation, simultaneously predicting semantic masks for things and background using a single end-to-end model.

📖

terms

Mamba-DETR

Detection architecture that replaces attention mechanisms with State Space Blocks inspired by Mamba, offering linear complexity and competitive performance for real-time object detection.

AI Glossary

DETR (DEtection TRansformer)

Bipartite Transformer

Object Queries

Bipartite Matching Loss

Transformer Encoder-Decoder

Multi-Scale Multi-head Attention (MSA)

DETR-ResNet

Mask2Former

Positional Embeddings

Conditional DETR

Deformable DETR

Sparse R-CNN

Query-to-Attention

DINO (DETR with Improved deNoising Anchor Boxes)

Focal Loss for Transformers

Panoptic Segmentation by Transformer

Mamba-DETR

No results found