Transformer Optimization
Gradient Accumulation
Method that simulates a larger batch size by accumulating gradients over multiple forward passes before updating the model weights.
← 뒤로Method that simulates a larger batch size by accumulating gradients over multiple forward passes before updating the model weights.
← 뒤로