Attention Mechanism
Attention Layer Normalization
Normalization applied before or after the attention mechanism to stabilize training, typically implemented as pre-norm in modern architectures.
← Terug