Multimodal QA
Conditional Response Generation
Process where a language model generates a textual response whose content is conditioned and guided by information extracted from a non-textual modality such as an image.
← Indietro