Knowledge Distillation - Glossario IA

📖

termini

Teacher Model

Large and complex pre-trained neural model that serves as a knowledge source to train a more compact model through the distillation process.

📖

termini

Student Model

Smaller neural model that learns to imitate the behavior of the teacher model, benefiting from its generalizations while being more computationally efficient.

📖

termini

Soft Targets

Output probabilities from the teacher model before applying the argmax function, containing information about inter-class relationships that hard labels don't capture.

📖

termini

Temperature Scaling

Technique of adjusting logits by dividing by a temperature parameter to soften the probability distribution and reveal inter-class relationships during distillation.

📖

termini

Hard Targets

Traditional ground truth labels (one-hot encoded) used together with soft targets to maintain prediction accuracy during distillation.

📖

termini

Dark Knowledge

Subtle information contained in the teacher model's output probabilities that reveals similarities between classes and is not present in hard labels.

📖

termini

Distillation Loss

Combined loss function that measures both the divergence between soft predictions of the student and teacher, and accuracy with respect to hard labels.

📖

termini

Feature Distillation

Variant of distillation where the student learns to reproduce the teacher's intermediate representations (features) rather than just the final predictions.

📖

termini

Relational Knowledge Distillation

Approach where the student learns the structural relationships between training samples preserved by the teacher, beyond individual predictions.

📖

termini

Self-Knowledge Distillation

Technique where a model self-distills by using its own knowledge at different training stages or different branches to improve its performance.

📖

termini

Multi-Teacher Distillation

Strategy using multiple teacher models to transfer diversified knowledge to a single student, combining their respective expertise.

📖

termini

Online Distillation

Method where teacher and student models are trained simultaneously, allowing dynamic and adaptive knowledge transfer during the learning process.

📖

termini

Zero-Shot Knowledge Distillation

Approach that allows distilling knowledge from a teacher without requiring training data, using only the pre-trained model weights.

📖

termini

Attention-Based Distillation

Specific technique where the student learns to reproduce the teacher's attention maps, thus transferring knowledge about the important parts of the input data.

📖

termini

Structural Knowledge Distillation

Method that preserves the teacher's structure and architecture in the student, maintaining the relationships between layers and original information flows.

📖

termini

Progressive Knowledge Distillation

Multi-step strategy where an intermediate model serves as a teacher for the final student, allowing a smooth transition of knowledge.

📖

termini

Knowledge Purification

Process of filtering noisy or incorrect knowledge from the teacher before distillation, ensuring a higher quality knowledge transfer to the student.

📖

termini

Heterogeneous Knowledge Distillation

Approach where teacher and student have different architectures (CNN to Transformer, for example), requiring specific adaptation techniques for knowledge transfer.

Glossario IA

Teacher Model

Student Model

Soft Targets

Temperature Scaling

Hard Targets

Dark Knowledge

Distillation Loss

Feature Distillation

Relational Knowledge Distillation

Self-Knowledge Distillation

Multi-Teacher Distillation

Online Distillation

Zero-Shot Knowledge Distillation

Attention-Based Distillation

Structural Knowledge Distillation

Progressive Knowledge Distillation

Knowledge Purification

Heterogeneous Knowledge Distillation

Nessun risultato trovato