Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Teacher Model
Large and complex pre-trained neural model that serves as a knowledge source to train a more compact model through the distillation process.
Student Model
Smaller neural model that learns to imitate the behavior of the teacher model, benefiting from its generalizations while being more computationally efficient.
Soft Targets
Output probabilities from the teacher model before applying the argmax function, containing information about inter-class relationships that hard labels don't capture.
Temperature Scaling
Technique of adjusting logits by dividing by a temperature parameter to soften the probability distribution and reveal inter-class relationships during distillation.
Hard Targets
Traditional ground truth labels (one-hot encoded) used together with soft targets to maintain prediction accuracy during distillation.
Dark Knowledge
Subtle information contained in the teacher model's output probabilities that reveals similarities between classes and is not present in hard labels.
Distillation Loss
Combined loss function that measures both the divergence between soft predictions of the student and teacher, and accuracy with respect to hard labels.
Feature Distillation
Variant of distillation where the student learns to reproduce the teacher's intermediate representations (features) rather than just the final predictions.
Relational Knowledge Distillation
Approach where the student learns the structural relationships between training samples preserved by the teacher, beyond individual predictions.
Self-Knowledge Distillation
Technique where a model self-distills by using its own knowledge at different training stages or different branches to improve its performance.
Multi-Teacher Distillation
Strategy using multiple teacher models to transfer diversified knowledge to a single student, combining their respective expertise.
Online Distillation
Method where teacher and student models are trained simultaneously, allowing dynamic and adaptive knowledge transfer during the learning process.
Zero-Shot Knowledge Distillation
Approach that allows distilling knowledge from a teacher without requiring training data, using only the pre-trained model weights.
Attention-Based Distillation
Specific technique where the student learns to reproduce the teacher's attention maps, thus transferring knowledge about the important parts of the input data.
Structural Knowledge Distillation
Method that preserves the teacher's structure and architecture in the student, maintaining the relationships between layers and original information flows.
Progressive Knowledge Distillation
Multi-step strategy where an intermediate model serves as a teacher for the final student, allowing a smooth transition of knowledge.
Knowledge Purification
Process of filtering noisy or incorrect knowledge from the teacher before distillation, ensuring a higher quality knowledge transfer to the student.
Heterogeneous Knowledge Distillation
Approach where teacher and student have different architectures (CNN to Transformer, for example), requiring specific adaptation techniques for knowledge transfer.