Multimodal Models
Vision-Language Model (VLM)
Subclass of multimodal models specialized in joint understanding of text and images, capable of tasks like image captioning, visual reasoning, or image generation from text.
← Kembali