Multimodal interpretability
Visuo-Linguistic Grounding
Process of anchoring linguistic symbols (words, phrases) to concrete entities or concepts in visual data, fundamental for model explanations linking text and image to be semantically correct and understandable.
← 뒤로