Low-Resource Models
Gradient Checkpointing
Computation-memory trade-off technique that omits saving intermediate activations during the forward pass, to recalculate them during the backward pass, thus drastically reducing GPU memory usage at the cost of increased computation time.
← Geri