Multi-Modal Transformers
BEiT-3
Bidirectional Encoder representation from Image Transformer v3 model using a multiway Transformer with modality-specific embeddings to process image, text, and image-text in a unified manner.
← Zurück