KI-Glossar

Das vollständige Wörterbuch der Künstlichen Intelligenz

162

Kategorien

2.032

Unterkategorien

23.060

Begriffe

📖

Begriffe

Multimodal fusion

Process of integrating representations from different modalities (audio and video) to create a unified and enriched understanding of content.

📖

Begriffe

Joint representation

Shared vector space where audio and video features are projected to capture their common semantic relationships.

📖

Begriffe

Temporal alignment

Process of precisely matching audio and video events in time to establish causal and semantic correlations.

📖

Begriffe

Multimodal transformer model

Neural architecture based on attention mechanisms specifically designed to simultaneously process and integrate audio and video data.

📖

Begriffe

Joint feature extraction

Process of identifying and extracting attributes that exist only when audio and video modalities are considered together.

📖

Begriffe

Cross-modal correlation

Statistical measure of dependencies and relationships between audio and video signals to quantify their degree of semantic association.

📖

Begriffe

Audio-video segmentation

Joint division of audio and video streams into coherent temporal segments based on shared semantic or thematic changes.

📖

Begriffe

Multimodal reconstruction

Task of generating a missing modality (audio or video) from the available modality, using jointly learned representations.

🔍

KI-Glossar

Multimodal fusion

Joint representation

Temporal alignment

Multimodal transformer model

Joint feature extraction

Cross-modal correlation

Audio-video segmentation

Multimodal reconstruction

Keine Ergebnisse gefunden