Layer Normalization
Pre-Layer Normalization
Variant of layer normalization applied before attention and feed-forward sublayers, improving training stability in deep Transformers.
← BackVariant of layer normalization applied before attention and feed-forward sublayers, improving training stability in deep Transformers.
← Back