Multimodal QA
Visual Reasoning
Ability of a QA system to infer non-explicit information by analyzing spatial relationships, object attributes, or complex scenes in an image.
← TerugAbility of a QA system to infer non-explicit information by analyzing spatial relationships, object attributes, or complex scenes in an image.
← Terug