Multimodal QA
Modality-to-Modality Alignment
Learning process that matches segments of one modality (e.g., a sentence) with relevant segments of another (e.g., an image region).
← BackLearning process that matches segments of one modality (e.g., a sentence) with relevant segments of another (e.g., an image region).
← Back