Multimodal Translation
Visual Question Answering
System that answers textual questions about image content, requiring joint understanding of vision and language. VQA combines object detection, spatial reasoning, and linguistic comprehension.
← Indietro