Mixed Quantization

📖

terms

Optimization technique applying different bit precisions to neural network layers to balance performance and model size. This strategic approach maintains high precision in critical layers while reducing overall memory.

📖

terms

Quantization-Aware Training

Methodology integrating pseudo-quantization operations during training to simulate the effect of low-precision quantization. This technique allows the model to adapt to rounding errors before final conversion.

📖

terms

Layer Sensitivity

Measure of the impact of quantization on the performance of each individual layer of the neural network. Sensitive layers require higher precision to maintain overall model quality.

📖

terms

Heterogeneous Quantization

Quantization approach dynamically assigning different bit widths according to computational characteristics and importance of each layer. This strategy optimizes the trade-off between hardware acceleration and precision degradation.

📖

terms

Model Profiling

Comprehensive analysis of a trained model's characteristics to identify candidate layers for different quantization strategies. Profiling evaluates statistical distributions, dynamic ranges, and impact on precision.

📖

terms

Per-Tensor Quantization

Method applying a single set of quantization parameters to an entire tensor, ensuring scale consistency for all values. This approach simplifies hardware implementation but may reduce precision for extended distributions.

📖

terms

Per-Channel Quantization

Quantization technique using distinct parameters for each channel or group of channels in a convolutional layer. This method better preserves precision by adapting scale to specific characteristics of each filter.

📖

terms

Quantization Scale

Multiplicative parameter converting floating-point values into quantized integers according to the formula Q = round(R/S + Z). The scale determines the precision and representation range of quantized values.

📖

terms

Point Zéro de Quantification

Valeur entière correspondant à la valeur zéro en virgule flottante dans le système quantifié, essentielle pour préserver les zéros structurels des réseaux neuronaux. Ce paramètre permet un alignement précis entre les domaines quantifié et réel.

📖

terms

Bruit de Quantification

Erreur introduite lors de la conversion des nombres à haute précision vers une représentation à bits réduits, se manifestant comme une perte d'information. L'analyse du bruit de quantification guide la sélection des couches à préserver en haute précision.

📖

terms

Requantification

Processus de conversion entre différentes précisions de quantification au sein d'un même modèle, nécessaire lors d'opérations entre couches de bits différents. La requantification maintient la cohérence numérique tout en optimisant l'utilisation des ressources.

📖

terms

Stratégie de Bits Variables

Approche algorithmique déterminant la répartition optimale des largeurs de bits à travers le réseau pour minimiser la taille du modèle sous contrainte de précision. Cette stratégie résout un problème d'optimisation combinatoire complexe.

📖

terms

Quantification Hiérarchique

Méthode organisant les couches en hiérarchies basées sur leur importance et leur sensibilité à la quantification. La quantification hiérarchique applique des politiques de bits différentes selon le niveau hiérarchique de chaque groupe de couches.

AI Glossary