Mixed Precision Computing

📖

Begriffe

Mixed Precision Computing

Computing technique that simultaneously uses multiple numerical formats of variable precision (FP64, FP32, FP16, INT8) to optimize the balance between memory performance, computational throughput, and result accuracy in AI applications.

📖

Begriffe

FP16 (Half-Precision Floating Point)

16-bit numerical representation format consisting of 1 sign bit, 5 exponent bits, and 10 mantissa bits, used to accelerate computations and reduce memory footprint at the cost of reduced precision.

📖

Begriffe

FP32 (Single-Precision Floating Point)

Standard 32-bit numerical representation format with 1 sign bit, 8 exponent bits, and 23 mantissa bits, constituting the precision reference for training most AI models.

📖

Begriffe

INT8 (8-bit Integer)

8-bit quantization format representing signed integers, primarily used for inference to maximize computational throughput and minimize energy consumption of hardware accelerators.

📖

Begriffe

Tensor Cores

Specialized computing units integrated into modern GPUs (NVIDIA) designed to execute matrix multiplication-accumulation operations in mixed precision (FP16/FP32) in a highly parallel manner.

📖

Begriffe

Dynamic Loss Scaling

Adaptive variant of loss scaling where the scale factor is dynamically adjusted during training, increasing in case of stability and decreasing in case of overflow to optimize convergence.

📖

Begriffe

Master Weights

Copy of model weights maintained in FP32 (or FP64) during mixed precision training, serving as a precision reference for weight updates while forward/backward computations are performed in FP16.

📖

Begriffe

Automatic Mixed Precision (AMP)

Feature of AI frameworks (PyTorch, TensorFlow) that automatically selects operations to execute in FP16 or FP32, manages type conversion, and applies loss scaling transparently.

📖

Begriffe

Vector Processing Units (VPU)

Specialized hardware accelerators optimized for integer (INT8) and low-precision calculations, designed for efficient neural network inference on edge devices.

📖

Begriffe

Sparsity Acceleration

Technique combined with mixed precision that exploits zeros in tensors to skip unnecessary calculations, reducing memory bandwidth and increasing the effective throughput of matrix operations.

📖

Begriffe

Numerical Stability Analysis

Systematic evaluation of the impact of precision reduction on model convergence and final accuracy, identifying sensitive layers that require maintaining FP32 in a mixed precision strategy.

📖

Begriffe

FP8 (8-bit Floating Point)

Emerging 8-bit representation format with different variants (E4M3, E5M2) optimized for training and inference, offering an extreme trade-off between throughput and precision for very large models.

📖

Begriffe

Gradient Accumulation in Mixed Precision

Technique where gradients calculated in FP16 are accumulated in an FP32 buffer before weight update, preventing precision loss during aggregation over multiple mini-batches.

📖

Begriffe

Precision-Aware Pruning

Network pruning method that considers each layer's sensitivity to precision reduction, applying more aggressive pruning on layers robust in low precision to maximize acceleration.

KI-Glossar