Mixed Precision Computing

📖

용어

Computing technique that simultaneously uses multiple numerical formats of variable precision (FP64, FP32, FP16, INT8) to optimize the balance between memory performance, computational throughput, and result accuracy in AI applications.

📖

용어

FP16 (Half-Precision Floating Point)

16-bit numerical representation format consisting of 1 sign bit, 5 exponent bits, and 10 mantissa bits, used to accelerate computations and reduce memory footprint at the cost of reduced precision.

📖

용어

FP32 (Single-Precision Floating Point)

Standard 32-bit numerical representation format with 1 sign bit, 8 exponent bits, and 23 mantissa bits, constituting the precision reference for training most AI models.

📖

용어

INT8 (8-bit Integer)

8-bit quantization format representing signed integers, primarily used for inference to maximize computational throughput and minimize energy consumption of hardware accelerators.

📖

용어

Tensor Cores

Specialized computing units integrated into modern GPUs (NVIDIA) designed to execute matrix multiplication-accumulation operations in mixed precision (FP16/FP32) in a highly parallel manner.

📖

용어

Dynamic Loss Scaling

Adaptive variant of loss scaling where the scale factor is dynamically adjusted during training, increasing in case of stability and decreasing in case of overflow to optimize convergence.

📖

용어

Master Weights

Copy of model weights maintained in FP32 (or FP64) during mixed precision training, serving as a precision reference for weight updates while forward/backward computations are performed in FP16.

📖

용어

Automatic Mixed Precision (AMP)

Feature of AI frameworks (PyTorch, TensorFlow) that automatically selects operations to execute in FP16 or FP32, manages type conversion, and applies loss scaling transparently.

📖

용어

Vector Processing Units (VPU)

Specialized hardware accelerators optimized for integer (INT8) and low-precision calculations, designed for efficient neural network inference on edge devices.

📖

용어

Sparsity Acceleration

Technique combined with mixed precision that exploits zeros in tensors to skip unnecessary calculations, reducing memory bandwidth and increasing the effective throughput of matrix operations.

📖

용어

Numerical Stability Analysis

Systematic evaluation of the impact of precision reduction on model convergence and final accuracy, identifying sensitive layers that require maintaining FP32 in a mixed precision strategy.

📖

용어

FP8 (8-bit Floating Point)

Emerging 8-bit representation format with different variants (E4M3, E5M2) optimized for training and inference, offering an extreme trade-off between throughput and precision for very large models.

📖

용어

Gradient Accumulation in Mixed Precision

Technique where gradients calculated in FP16 are accumulated in an FP32 buffer before weight update, preventing precision loss during aggregation over multiple mini-batches.

📖

용어

Precision-Aware Pruning

Network pruning method that considers each layer's sensitivity to precision reduction, applying more aggressive pruning on layers robust in low precision to maximize acceleration.

AI 용어집