Multimodal Transformers
ALIGN
Contrastive image-text model trained on one billion automatically filtered noisy pairs. Demonstrates that data quantity can compensate for noise in large-scale multimodal learning.
← Back