🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Vision-Language Model (VLM)

Subclass of multimodal models specialized in joint understanding of text and images, capable of tasks like image captioning, visual reasoning, or image generation from text.

📖
thuật ngữ

Visual Tokenization

Technique that cuts an image into a sequence of patches or discrete tokens, often through a neural network like a Vision Transformer (ViT), to make it compatible with textual transformer architecture.

📖
thuật ngữ

Alignment Model

Model, often based on a contrastor like CLIP, trained on immense corpora of (image, text) pairs to learn to project both modalities into a shared vector space where cosine similarity reflects their mutual relevance.

📖
thuật ngữ

Multimodal Conditional Generation

Generation task where the output (e.g., text, image) is produced based on one or more inputs of different modalities, such as describing an image or creating an image from text.

📖
thuật ngữ

Multimodal Chain-of-Thought Reasoning

Ability of a model to use information from multiple modalities to construct a logical sequence of thought and reach a conclusion, for example by analyzing a chart and text to answer a question.

📖
thuật ngữ

Multimodal Perceptron

Theoretical concept or primitive architecture where inputs of different natures are combined, often by concatenation or a fusion operation, before being processed by fully connected neural layers.

📖
thuật ngữ

Multimodal Diffusion Model

Generation architecture that uses an iterative noising and denoising process to create data (e.g., images) conditioned by another modality (e.g., a text description), guiding the denoising with conditioning information.

📖
thuật ngữ

Separate Encoding vs Unified Encoding

Two architectural strategies for multimodal models: separate encoding processes each modality with a dedicated encoder before fusion, while unified encoding uses a single transformer to process a sequence of mixed tokens.

📖
thuật ngữ

Multimodal Zero-Shot Learning

Ability of a model to perform a task on one modality (e.g., classifying an image) without having been explicitly trained for it, by leveraging knowledge transferred from another modality (e.g., the text of class labels).

📖
thuật ngữ

Audio-Visual-Text Model

An advanced form of multimodal model integrating three data streams (audio, image, text) for complex tasks like video description, where the model must synchronize and interpret visual and auditory information to produce a textual narration.

📖
thuật ngữ

Latent Projection

A neural network layer, often a simple linear transformation, used to map the embedding vectors of each modality into a common latent space before their fusion or comparison.

📖
thuật ngữ

Multimodal Foundation Model

A very large-scale model, pre-trained on massive amounts of heterogeneous data, that serves as a base for adaptation (fine-tuning) to a multitude of specific multimodal tasks.

📖
thuật ngữ

Modularity in Multimodal Models

A design principle where the encoders for each modality are distinct and interchangeable modules, allowing for updating or replacing a component (e.g., the vision encoder) without retraining the entire model.

📖
thuật ngữ

Multimodal Prompting

An interaction technique with a model where the input (the 'prompt') is composed of multiple modalities, for example, an image accompanied by a textual question, to guide the model towards a specific response.

🔍

Không tìm thấy kết quả