Feed-Forward Networks
Two-layer MLP
Standard multilayer architecture of FFNs in Transformers consisting of two linear transformations with a nonlinear activation function between them.
← GeriStandard multilayer architecture of FFNs in Transformers consisting of two linear transformations with a nonlinear activation function between them.
← Geri