AI 용어집
인공지능 완전 사전
DETR (DEtection TRansformer)
Pioneering architecture that eliminates the need for anchors and non-maximum suppression by treating object detection as a direct set prediction problem, using a bipartite transformer to model relationships between objects.
Bipartite Transformer
Variant of the Transformer architecture where attention mechanisms are applied between image features and a small fixed set of learnable object queries, enabling parallel object prediction.
Object Queries
Learnable positional embedding vectors that serve as slots for each potential object prediction, interacting with image features through the attention mechanism to extract relevant information.
Bipartite Matching Loss
Loss function based on the Hungarian algorithm that finds an optimal one-to-one matching between model predictions and ground truths, solving the permutation problem of unsupervised predictions.
Transformer Encoder-Decoder
Structure where the encoder processes image features to create a context-rich representation, and the decoder uses object queries to decode this representation into final box and class predictions.
Multi-Scale Multi-head Attention (MSA)
Attention mechanism that operates on fused features from multiple levels of the feature map, allowing the model to simultaneously capture local and global information for better detection of objects of various sizes.
DETR-ResNet
Variant of DETR that uses a ResNet convolutional neural network as the main feature extractor, combining the power of CNNs for feature extraction with the global reasoning of Transformers.
Mask2Former
Unified architecture for panoptic, instance, and semantic segmentation that masks regions of interest and directly predicts masks using transformers, outperforming previous approaches in terms of accuracy and simplicity.
Positional Embeddings
Vectors added to image features to provide spatial information to the Transformer, essential for the model to understand scene geometry and correctly locate objects.
Conditional DETR
Improvement of DETR that accelerates convergence by conditioning object queries on image content, allowing better query specialization and more accurate predictions.
Deformable DETR
Variant of DETR that integrates deformable attention modules to focus on a small set of key points, significantly improving convergence speed and performance, especially for small objects.
Sparse R-CNN
Fully sparse detection approach that uses a fixed set of learnable proposed boxes and a cascade of transformers to refine predictions, eliminating the need for heuristics like anchors or NMS.
Query-to-Attention
Mechanism where object queries guide the model's attention to relevant regions of the image, unlike global attention, improving prediction efficiency and specialization.
DINO (DETR with Improved deNoising Anchor Boxes)
State-of-the-art model that combines improved denoising anchor boxes with a Transformer architecture, achieving state-of-the-art performance on detection benchmarks without requiring NMS.
Focal Loss for Transformers
Loss function designed to address the slow convergence problem of DETR models by focusing on hard samples and reducing the contribution of well-classified easy samples.
Panoptic Segmentation by Transformer
Application of Transformer architectures to the unified task of panoptic segmentation, simultaneously predicting semantic masks for things and background using a single end-to-end model.
Mamba-DETR
Detection architecture that replaces attention mechanisms with State Space Blocks inspired by Mamba, offering linear complexity and competitive performance for real-time object detection.