🏠 Home
Benchmark Hub
📊 All Benchmarks 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List Applications 🎨 Creative Free Pages 🎯 FSACB - Ultimate Showcase 🌍 Translation Benchmark
Models
🏆 Top 10 Models 🆓 Free Models 📋 All Models ⚙️ Kilo Code
Resources
💬 Prompts Library 📖 AI Glossary 🔗 Useful Links

AI Glossary

The complete dictionary of Artificial Intelligence

162
categories
2,032
subcategories
23,060
terms
📖
terms

Teacher Model

Large and complex pre-trained neural model that serves as a knowledge source to train a more compact model through the distillation process.

📖
terms

Student Model

Smaller neural model that learns to imitate the behavior of the teacher model, benefiting from its generalizations while being more computationally efficient.

📖
terms

Soft Targets

Output probabilities from the teacher model before applying the argmax function, containing information about inter-class relationships that hard labels don't capture.

📖
terms

Temperature Scaling

Technique of adjusting logits by dividing by a temperature parameter to soften the probability distribution and reveal inter-class relationships during distillation.

📖
terms

Hard Targets

Traditional ground truth labels (one-hot encoded) used together with soft targets to maintain prediction accuracy during distillation.

📖
terms

Dark Knowledge

Subtle information contained in the teacher model's output probabilities that reveals similarities between classes and is not present in hard labels.

📖
terms

Distillation Loss

Combined loss function that measures both the divergence between soft predictions of the student and teacher, and accuracy with respect to hard labels.

📖
terms

Feature Distillation

Variant of distillation where the student learns to reproduce the teacher's intermediate representations (features) rather than just the final predictions.

📖
terms

Relational Knowledge Distillation

Approach where the student learns the structural relationships between training samples preserved by the teacher, beyond individual predictions.

📖
terms

Self-Knowledge Distillation

Technique where a model self-distills by using its own knowledge at different training stages or different branches to improve its performance.

📖
terms

Multi-Teacher Distillation

Strategy using multiple teacher models to transfer diversified knowledge to a single student, combining their respective expertise.

📖
terms

Online Distillation

Method where teacher and student models are trained simultaneously, allowing dynamic and adaptive knowledge transfer during the learning process.

📖
terms

Zero-Shot Knowledge Distillation

Approach that allows distilling knowledge from a teacher without requiring training data, using only the pre-trained model weights.

📖
terms

Attention-Based Distillation

Specific technique where the student learns to reproduce the teacher's attention maps, thus transferring knowledge about the important parts of the input data.

📖
terms

Structural Knowledge Distillation

Method that preserves the teacher's structure and architecture in the student, maintaining the relationships between layers and original information flows.

📖
terms

Progressive Knowledge Distillation

Multi-step strategy where an intermediate model serves as a teacher for the final student, allowing a smooth transition of knowledge.

📖
terms

Knowledge Purification

Process of filtering noisy or incorrect knowledge from the teacher before distillation, ensuring a higher quality knowledge transfer to the student.

📖
terms

Heterogeneous Knowledge Distillation

Approach where teacher and student have different architectures (CNN to Transformer, for example), requiring specific adaptation techniques for knowledge transfer.

🔍

No results found