AI-ordlista
Den kompletta ordlistan över AI
Multimodal Model
Artificial intelligence architecture capable of simultaneously processing and integrating multiple types of data such as text, images, audio, and video within a unified framework.
Early Fusion
Multimodal integration strategy where different modalities are combined at the raw feature level before processing by the main model.
Late Fusion
Multimodal approach where each modality is processed independently until the final layers of the model, before merging the representations for the final decision.
Cross-modal Alignment
Learning process aimed at establishing semantic correspondences between different modalities in a common representation space.
Vision-Language Encoding
Mechanism that simultaneously transforms visual and textual inputs into compatible vector representations for joint processing.
Cross-modal Attention
Attention mechanism allowing the model to dynamically weight the importance of information from one modality relative to another.
Multimodal Embeddings
Dense vector representations that encode information from multiple modalities in a shared semantic space.
Multimodal Zero-shot Learning
Ability of a multimodal model to generalize to new tasks or modality combinations without specific training examples.
Multimodal Tokenization
Process of converting different modalities (image, audio, video) into token sequences compatible with Transformer architecture.
Multimodal Contrastive Pre-training
Self-supervised method that maximizes similarity between positive multimodal pairs while minimizing that of negative pairs.
Common Latent Space Projection
Linear or non-linear transformation aligning representation spaces of different modalities into a unified vector space.
Hybrid Encoder-Decoder Architecture
Structure combining specialized encoders per modality with a unified decoder for generating multimodal outputs.
Multimodal Fine-tuning
Process of adapting a pre-trained multimodal model to specific tasks while preserving its cross-modal processing capabilities.
Multimodal Prompt Engineering
Technique for optimizing inputs combining text and other modalities to effectively guide multimodal models toward desired outputs.
Multimodal Chain-of-Thought Reasoning
Model's ability to generate explicit reasoning steps by integrating evidence from multiple modalities.
Multimodal Conditioned Generation
Process of creating content in a target modality based on conditions or constraints provided in other modalities.
Intermediate Fusion
Multimodal integration strategy where modalities are merged at multiple intermediate levels of the neural network.
Multimodal Transformers
Extension of the Transformer architecture capable of simultaneously processing sequences from different modalities with adapted attention mechanisms.