Multimodal QA
Visual Grounding
The act of anchoring linguistic concepts (words, phrases) to specific entities or regions in an image or video, creating a tangible semantic link.
← IndietroThe act of anchoring linguistic concepts (words, phrases) to specific entities or regions in an image or video, creating a tangible semantic link.
← Indietro