Multimodal interpretability
Foundational VISION-LANGUAGE Model
Large-scale model pre-trained on massive corpora of textual and visual data, capable of understanding and generating content from these two modalities, whose interpretability is a major challenge due to its intrinsic complexity.
← Zurück