Multi-Modal Transformers
Modality Embedding
Specific encoding vectors added to token embeddings to indicate the original modality (text, image, audio), allowing the Transformer to distinguish and process each data type differently.
← Zurück