🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Vision Transformer (ViT)

Neural architecture applying Transformer mechanisms to image processing by dividing images into sequences of patches for sequential processing.

📖
thuật ngữ

Patch Embedding

Process of converting image patches into fixed-dimensional embedding vectors through linear projection to feed into the Transformer.

📖
thuật ngữ

Class Token

Special token added to the embedding sequence whose final representation after passing through the Transformer is used for image classification.

📖
thuật ngữ

Multi-Head Self-Attention

Mechanism allowing the model to simultaneously compute multiple attention representations to capture different relationships between image patches.

📖
thuật ngữ

Transformer Encoder

Fundamental block composed of self-attention layers and feed-forward networks alternating with normalization and residual connections.

📖
thuật ngữ

Image Patch Tokenization

Process of cutting an image into non-overlapping fixed-size patches, typically 16x16 pixels, which are then converted into sequential tokens.

📖
thuật ngữ

Attention Map Visualization

Interpretability technique visualizing attention weights between patches to understand which image regions the model focuses on.

📖
thuật ngữ

Pre-training on Large Datasets

Initial training phase on millions of images like ImageNet-21k to learn general visual representations before fine-tuning.

📖
thuật ngữ

Patch Size Hyperparameter

Crucial parameter defining the dimension of image patches directly influencing computational complexity and model performance.

📖
thuật ngữ

Token-to-Patch Reconstruction

Reverse process in generative tasks where tokens are converted back into image patches to reconstruct the original image.

📖
thuật ngữ

Hierarchical Vision Transformer

Variant of ViT using a pyramid structure with variable patch sizes to capture multi-scale features.

📖
thuật ngữ

Self-Supervised ViT Pre-training

Unsupervised training methods like DINO or MAE leveraging the Transformer structure to learn without annotations.

📖
thuật ngữ

Cross-Attention in Multi-Modal ViT

Mechanism extending ViT to jointly process images and text using attention between different modalities.

📖
thuật ngữ

Computational Complexity O(n²)

Quadratic complexity of self-attention with respect to the number of patches constituting the main limitation of Vision Transformers.

🔍

Không tìm thấy kết quả