Transformer Architecture
Position-wise Feed-Forward
Neural network applied identically and independently to each position in the sequence. Transforms representations after the attention mechanism by introducing non-linearity.
← Zurück