Multimodal Transformers
Cross-Modal Representation
Shared vector space where embeddings from different modalities are semantically aligned to enable cross-modal interactions. Facilitates knowledge transfer and unified understanding between text, images, audio, and video.
← Quay lại