Knowledge Distillation

📖

terms

Teacher Model

Large and complex pre-trained neural model that serves as a knowledge source to train a more compact model through the distillation process.

📖

terms

Student Model

Smaller neural model that learns to imitate the behavior of the teacher model, benefiting from its generalizations while being more computationally efficient.

📖

terms

Soft Targets

Output probabilities from the teacher model before applying the argmax function, containing information about inter-class relationships that hard labels don't capture.

📖

terms

Temperature Scaling

Technique of adjusting logits by dividing by a temperature parameter to soften the probability distribution and reveal inter-class relationships during distillation.

📖

terms

Hard Targets

Traditional ground truth labels (one-hot encoded) used together with soft targets to maintain prediction accuracy during distillation.

📖

terms

Dark Knowledge

Subtle information contained in the teacher model's output probabilities that reveals similarities between classes and is not present in hard labels.

📖

terms

Distillation Loss

Combined loss function that measures both the divergence between soft predictions of the student and teacher, and accuracy with respect to hard labels.

📖

terms

Feature Distillation

Variant of distillation where the student learns to reproduce the teacher's intermediate representations (features) rather than just the final predictions.

📖

terms

Relational Knowledge Distillation

Approach where the student learns the structural relationships between training samples preserved by the teacher, beyond individual predictions.

📖

terms

Self-Knowledge Distillation

Technique where a model self-distills by using its own knowledge at different training stages or different branches to improve its performance.

📖

terms

Multi-Teacher Distillation

Strategy using multiple teacher models to transfer diversified knowledge to a single student, combining their respective expertise.

📖

terms

Online Distillation

Method where teacher and student models are trained simultaneously, allowing dynamic and adaptive knowledge transfer during the learning process.

📖

terms

Zero-Shot Knowledge Distillation

Approach that allows distilling knowledge from a teacher without requiring training data, using only the pre-trained model weights.

📖

terms

Attention-Based Distillation

Specific technique where the student learns to reproduce the teacher's attention maps, thus transferring knowledge about the important parts of the input data.

📖

terms

Structural Knowledge Distillation

Method that preserves the teacher's structure and architecture in the student, maintaining the relationships between layers and original information flows.

📖

terms

Progressive Knowledge Distillation

Multi-step strategy where an intermediate model serves as a teacher for the final student, allowing a smooth transition of knowledge.

📖

terms

Knowledge Purification

Process of filtering noisy or incorrect knowledge from the teacher before distillation, ensuring a higher quality knowledge transfer to the student.

📖

terms

Heterogeneous Knowledge Distillation

Approach where teacher and student have different architectures (CNN to Transformer, for example), requiring specific adaptation techniques for knowledge transfer.

AI Glossary

Teacher Model

Student Model

Soft Targets

Temperature Scaling

Hard Targets

Dark Knowledge

Distillation Loss

Feature Distillation

Relational Knowledge Distillation

Self-Knowledge Distillation

Multi-Teacher Distillation

Online Distillation

Zero-Shot Knowledge Distillation

Attention-Based Distillation

Structural Knowledge Distillation

Progressive Knowledge Distillation

Knowledge Purification

Heterogeneous Knowledge Distillation

No results found