Layer Normalization
Pre-Layer Normalization
Variant of layer normalization applied before attention and feed-forward sublayers, improving training stability in deep Transformers.
← IndietroVariant of layer normalization applied before attention and feed-forward sublayers, improving training stability in deep Transformers.
← Indietro