🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Cross-modality

Ability of a system to understand and relate information from different modalities, such as text and images, to enrich contextual understanding.

📖
terimler

Vision-Language Transformer (VLT)

Transformer architecture pre-trained on large corpora of paired images and texts, designed for multimodal comprehension and generation tasks.

📖
terimler

Visual Reasoning

Ability of a QA system to infer non-explicit information by analyzing spatial relationships, object attributes, or complex scenes in an image.

📖
terimler

Visual Grounding

The act of anchoring linguistic concepts (words, phrases) to specific entities or regions in an image or video, creating a tangible semantic link.

📖
terimler

Modality-to-Modality Alignment

Learning process that matches segments of one modality (e.g., a sentence) with relevant segments of another (e.g., an image region).

📖
terimler

Vector Quantized Codebook (VQ)

Technique used in multimodal models to discretize continuous representations (e.g., of images) into a finite set of discrete tokens, facilitating their processing by language models.

📖
terimler

Multimodal Perceptron (MLP)

Neural network, often an MLP, that takes fused features from multiple modalities as input to perform a final classification or regression task.

📖
terimler

Two-Stream Fusion Model

Architecture where each modality is processed by a separate neural network (a stream) before their representations are combined for joint decision-making.

📖
terimler

Multimodal Information Retrieval

Task of retrieving relevant documents (e.g., images) from a query in another modality (e.g., text), based on their similarity in a shared embedding space.

📖
terimler

Conditional Response Generation

Process where a language model generates a textual response whose content is conditioned and guided by information extracted from a non-textual modality such as an image.

📖
terimler

Image Tokenization

Process of converting an image into a sequence of discrete tokens, often via a VAE or VQ-VAE, to make it compatible with Transformer-type architectures.

🔍

Sonuç bulunamadı