Multimodal interpretability

📖

terms

Multimodal Attribute Fusion

Process of combining features from different modalities (text, image, sound) into a unified representation for a learning model, aiming to capture complex interactions between data sources.

📖

terms

Interpretability method that involves projecting the contribution of a complex modality (e.g., an image) onto a simpler and interpretable space (e.g., keywords or concepts) to explain its influence on the model's prediction.

📖

terms

Multimodal Salience Map

Visualization that highlights the most influential regions or segments of each modality (pixels of an image, words of a text, audio segments) for a specific model decision, often by overlaying contributions on the original data.

📖

terms

Inter-modality Semantic Alignment

Technique aimed at establishing semantic correspondences between elements of different modalities (e.g., linking a word to an image region or a sound to an action), crucial for the model to understand relationships and provide coherent explanations.

📖

terms

Modality-wise Decomposition

Explainability approach that isolates and quantifies the individual contribution of each input modality to the final prediction, allowing understanding of whether a decision is primarily guided by text, image, or sound.

📖

terms

Multimodal Concept Bottleneck

Model architecture where the final prediction is conditioned on a set of interpretable concepts, themselves inferred from the fusion of modalities, offering clear traceability from raw data to concepts then to decision.

📖

terms

Orthogonality Regularization

Constraint applied during training to force the representations of different modalities in the shared latent space to be as independent as possible, avoiding redundancy and improving clarity of modality-specific explanations.

📖

terms

Multimodal Counterfactual Explanation

Generation of modified examples (by changing one or more modalities) that are sufficient to reverse the model's prediction, helping to understand the minimal conditions and interactions between modalities necessary for a decision.

📖

terms

Late Fusion for Interpretability

Strategy where each modality is processed by a specialized model up to an intermediate decision, with results then being fused. This approach facilitates interpretation by isolating the logic of each modality before the final combination.

📖

terms

Foundational VISION-LANGUAGE Model

Large-scale model pre-trained on massive corpora of textual and visual data, capable of understanding and generating content from these two modalities, whose interpretability is a major challenge due to its intrinsic complexity.

📖

terms

Modal Role Analysis

Systematic evaluation of the role played by each modality in different tasks or contexts, determining whether a modality acts as contextual support, a primary information source, or a modifier for others.

📖

terms

Visuo-Linguistic Grounding

Process of anchoring linguistic symbols (words, phrases) to concrete entities or concepts in visual data, fundamental for model explanations linking text and image to be semantically correct and understandable.

📖

terms

Fusion Node Interpretability

Method that focuses on analyzing the specific neurons or layers where the fusion of multimodal information occurs, to understand how interactions are encoded and how they influence the model's output.

📖

terms

Cross-Modal Gradient Explanation

Interpretability technique that calculates the gradient of the model's output with respect to the features of one modality, while conditioning this calculation on the features of another modality, thus revealing inter-modal dependencies.

AI Glossary

Multimodal Attribute Fusion

Explanation by Projection

Multimodal Salience Map

Inter-modality Semantic Alignment

Modality-wise Decomposition

Multimodal Concept Bottleneck

Orthogonality Regularization

Multimodal Counterfactual Explanation

Late Fusion for Interpretability

Foundational VISION-LANGUAGE Model

Modal Role Analysis

Visuo-Linguistic Grounding

Fusion Node Interpretability

Cross-Modal Gradient Explanation

No results found