Mixed Precision Computing

📖

terms

Mixed Precision Computing

Computing technique that simultaneously uses multiple numerical formats of variable precision (FP64, FP32, FP16, INT8) to optimize the balance between memory performance, computational throughput, and result accuracy in AI applications.

📖

terms

FP16 (Half-Precision Floating Point)

16-bit numerical representation format consisting of 1 sign bit, 5 exponent bits, and 10 mantissa bits, used to accelerate computations and reduce memory footprint at the cost of reduced precision.

📖

terms

FP32 (Single-Precision Floating Point)

Standard 32-bit numerical representation format with 1 sign bit, 8 exponent bits, and 23 mantissa bits, constituting the precision reference for training most AI models.

📖

terms

INT8 (8-bit Integer)

8-bit quantization format representing signed integers, primarily used for inference to maximize computational throughput and minimize energy consumption of hardware accelerators.

📖

terms

Tensor Cores

Specialized computing units integrated into modern GPUs (NVIDIA) designed to execute matrix multiplication-accumulation operations in mixed precision (FP16/FP32) in a highly parallel manner.

📖

terms

Dynamic Loss Scaling

Adaptive variant of loss scaling where the scale factor is dynamically adjusted during training, increasing in case of stability and decreasing in case of overflow to optimize convergence.

📖

terms

Master Weights

Copy of model weights maintained in FP32 (or FP64) during mixed precision training, serving as a precision reference for weight updates while forward/backward computations are performed in FP16.

📖

terms

Automatic Mixed Precision (AMP)

Feature of AI frameworks (PyTorch, TensorFlow) that automatically selects operations to execute in FP16 or FP32, manages type conversion, and applies loss scaling transparently.

📖

terms

Vector Processing Units (VPU)

Specialized hardware accelerators optimized for integer (INT8) and low-precision calculations, designed for efficient neural network inference on edge devices.

📖

terms

Sparsity Acceleration

Technique combined with mixed precision that exploits zeros in tensors to skip unnecessary calculations, reducing memory bandwidth and increasing the effective throughput of matrix operations.

📖

terms

Numerical Stability Analysis

Systematic evaluation of the impact of precision reduction on model convergence and final accuracy, identifying sensitive layers that require maintaining FP32 in a mixed precision strategy.

📖

terms

FP8 (8-bit Floating Point)

Emerging 8-bit representation format with different variants (E4M3, E5M2) optimized for training and inference, offering an extreme trade-off between throughput and precision for very large models.

📖

terms

Gradient Accumulation in Mixed Precision

Technique where gradients calculated in FP16 are accumulated in an FP32 buffer before weight update, preventing precision loss during aggregation over multiple mini-batches.

📖

terms

Precision-Aware Pruning

Network pruning method that considers each layer's sensitivity to precision reduction, applying more aggressive pruning on layers robust in low precision to maximize acceleration.

AI Glossary