Multimodal Models
Alignment Model
Model, often based on a contrastor like CLIP, trained on immense corpora of (image, text) pairs to learn to project both modalities into a shared vector space where cosine similarity reflects their mutual relevance.
← Zurück