Mixed Precision Computing

📖

thuật ngữ

Computing technique that simultaneously uses multiple numerical formats of variable precision (FP64, FP32, FP16, INT8) to optimize the balance between memory performance, computational throughput, and result accuracy in AI applications.

📖

thuật ngữ

FP16 (Half-Precision Floating Point)

16-bit numerical representation format consisting of 1 sign bit, 5 exponent bits, and 10 mantissa bits, used to accelerate computations and reduce memory footprint at the cost of reduced precision.

📖

thuật ngữ

FP32 (Single-Precision Floating Point)

Standard 32-bit numerical representation format with 1 sign bit, 8 exponent bits, and 23 mantissa bits, constituting the precision reference for training most AI models.

📖

thuật ngữ

INT8 (8-bit Integer)

8-bit quantization format representing signed integers, primarily used for inference to maximize computational throughput and minimize energy consumption of hardware accelerators.

📖

thuật ngữ

Tensor Cores

Specialized computing units integrated into modern GPUs (NVIDIA) designed to execute matrix multiplication-accumulation operations in mixed precision (FP16/FP32) in a highly parallel manner.

📖

thuật ngữ

Dynamic Loss Scaling

Adaptive variant of loss scaling where the scale factor is dynamically adjusted during training, increasing in case of stability and decreasing in case of overflow to optimize convergence.

📖

thuật ngữ

Master Weights

Copy of model weights maintained in FP32 (or FP64) during mixed precision training, serving as a precision reference for weight updates while forward/backward computations are performed in FP16.

📖

thuật ngữ

Automatic Mixed Precision (AMP)

Feature of AI frameworks (PyTorch, TensorFlow) that automatically selects operations to execute in FP16 or FP32, manages type conversion, and applies loss scaling transparently.

📖

thuật ngữ

Vector Processing Units (VPU)

Specialized hardware accelerators optimized for integer (INT8) and low-precision calculations, designed for efficient neural network inference on edge devices.

📖

thuật ngữ

Sparsity Acceleration

Technique combined with mixed precision that exploits zeros in tensors to skip unnecessary calculations, reducing memory bandwidth and increasing the effective throughput of matrix operations.

📖

thuật ngữ

Numerical Stability Analysis

Systematic evaluation of the impact of precision reduction on model convergence and final accuracy, identifying sensitive layers that require maintaining FP32 in a mixed precision strategy.

📖

thuật ngữ

FP8 (8-bit Floating Point)

Emerging 8-bit representation format with different variants (E4M3, E5M2) optimized for training and inference, offering an extreme trade-off between throughput and precision for very large models.

📖

thuật ngữ

Gradient Accumulation in Mixed Precision

Technique where gradients calculated in FP16 are accumulated in an FP32 buffer before weight update, preventing precision loss during aggregation over multiple mini-batches.

📖

thuật ngữ

Precision-Aware Pruning

Network pruning method that considers each layer's sensitivity to precision reduction, applying more aggressive pruning on layers robust in low precision to maximize acceleration.

Thuật ngữ AI