Attention Scaling
Gradient Stabilization
Process aimed at keeping gradients within a stable numerical range during backpropagation, essential for preventing training issues in deep networks.
← Wstecz