Quantization and Compression
Post-Training Quantization (PTQ)
Precision reduction technique applied to an already trained model, without requiring complete retraining. It converts high-precision weights and activations (e.g., FP32) to lower-precision representations (e.g., INT8) to optimize inference.
← Zurück