Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
TF32 (TensorFloat-32)
NVIDIA's proprietary 19-bit hybrid format combining 8-bit exponent from FP16 and 10-bit mantissa from FP32, optimized for matrix operations on Ampere and Hopper GPU Tensor Cores.
Dynamic Range
Range of representable values between the smallest normalized number and the largest floating point number, critical in precision selection as FP16 has a limited dynamic range (65504) compared to FP32 (3.4×10³⁸).
Post-Training Quantization (PTQ)
Process of converting a pre-trained full-precision model to reduced precision (FP16, INT8, INT4) without retraining, using calibration techniques to determine optimal scale and bias factors.
Fused Multiply-Add (FMA)
Hardware operation combining multiplication and addition into a single instruction (a×b+c) with single rounding, fundamental for accelerating matrix calculations in mixed precision and reducing cumulative rounding errors.
Numerical Stability
Property of an algorithm to maintain calculation precision in the face of rounding errors and overflow/underflow, particularly critical in mixed precision where reduced dynamic range can destabilize certain calculations.
INT8 Quantization
Technique for compressing neural weights and activations to 8-bit signed integers (-128 to 127) with scale factors and zero-points, offering up to 4x memory reduction and significant acceleration on compatible hardware.
Precision Matrix Operations
Set of linear operations (GEMM, convolution) where different parts of the calculation use different precisions - typically accumulation in FP32 with multiplication in FP16/BF16 to optimize throughput on modern GPUs.