Mixed Precision Computing
Gradient Accumulation in Mixed Precision
Technique where gradients calculated in FP16 are accumulated in an FP32 buffer before weight update, preventing precision loss during aggregation over multiple mini-batches.
← Geri