Momentum-based Optimization
AdamW
Variant of Adam that decouples weight decay from the adaptive update, applying decay directly to the weights rather than to the gradients.
← TillbakaVariant of Adam that decouples weight decay from the adaptive update, applying decay directly to the weights rather than to the gradients.
← Tillbaka