KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Contrastive Multimodal Learning
Self-supervised learning technique that learns representations by maximizing similarity between positive multimodal pairs and minimizing it for negative pairs.
Multimodal Representation Learning
Learning shared representations that capture correlations and complementarities between multiple data modalities (text, image, audio).
Modality Fusion
Strategic integration of information from different modalities to create a richer and more robust unified representation.
Multimodal Data Augmentation
Techniques for generating new training samples by applying consistent transformations that preserve inter-modal correlations.
Self-supervised Learning Objectives
Loss functions designed to exploit the intrinsic structures of multimodal data as supervised learning signals.
Cross-modal Correspondence
Automatic establishment of semantic relationships between segments or elements of different modalities without manual annotation.
Multimodal Feature Extraction
Process of automatically extracting discriminative features from multiple data sources simultaneously.
Self-supervised Task Design
Design of automatically generated learning tasks that force the model to understand inter-modal relationships.
Cross-modal Knowledge Transfer
Transfer of learned information from a data-rich modality to modalities with less available data.
Multimodal Contrastive Loss
Specific loss function that pulls positive multimodal pairs closer while pushing negative pairs apart in the embedding space.
Cross-modal Prediction
Self-supervised task consisting of predicting part of one modality from another correlated modality.
Multimodal Self-supervision
Learning paradigm exploiting natural correlations between modalities as a source of supervised signal without human annotation.
Cross-modal Reconstruction
Learning objective aimed at reconstructing a complete modality from a partial representation or another modality.
Multimodal Representation Alignment
Technique aimed at harmonizing the distributions of representations from different modalities in a common latent space.