Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Quantization
Process of reducing the numerical precision of AI model weights and activations to optimize inference and reduce memory footprint.
8-bit Quantization
Compression technique reducing model weights from 32 bits to 8 bits, offering an optimal trade-off between performance and accuracy for LLMs.
4-bit Quantization
Extreme compression method reducing weights to 4 bits, allowing significant memory gains but with potential quality loss.
Post-Training Quantization (PTQ)
Technique applied after model training, converting weights to reduced precision without requiring full retraining.
Quantization Aware Training (QAT)
Training approach simulating quantization effects during the learning process to minimize accuracy loss.
Dynamic Quantization
Method applied during inference where activations are quantized on-the-fly, offering flexibility but with computational overhead.
Static Quantization
Approach precomputing quantization parameters before inference, optimizing speed at the expense of flexibility.
Quantization Calibration
Process of determining optimal quantization parameters (scale, zero-point) from a sample of representative data.
GPTQ
Gradient-based Post Training Quantization, an advanced technique that iteratively optimizes quantized weights to minimize reconstruction error.
AWQ
Activation-aware Weight Quantization, a method that weights the importance of weights according to the amplitude of corresponding activations.
Zero-shot Quantification
Technique requiring no calibration data, using heuristics based on weight distribution to quantify the model.
Mixed Precision Quantification
Strategy applying different quantification precisions according to model layers to optimize the performance/accuracy trade-off.
Symmetric Quantification
Quantification scheme where the value range is centered around zero, simplifying calculations but potentially underutilizing the dynamic range.
Asymmetric Quantification
Approach allowing value ranges not centered on zero, optimizing the use of the quantized range for asymmetric distributions.
Scale Factor
Multiplicative parameter used to map continuous values into the quantized range, crucial for quantification accuracy.
Zero Point
Offset added during asymmetric quantification to align the floating-point zero value with the quantized representation.
Quantization Noise
Error introduced by precision reduction, manifesting as model performance degradation due to weight approximation.
Quantization-aware Fine-tuning
Post-quantization fine-tuning process aimed at recovering accuracy lost during model compression.
SmoothQuant
Quantization technique equalizing quantization difficulty between weights and activations through prior mathematical transformation.
LLM.int8()
Specific 8-bit quantization method for large language models, combining matrix decomposition and hybrid quantization.