Self-Attention
Add & Norm Layer
Residual normalization layer applied after the attention mechanism, combining the attention output with the original input (residual connection) before normalizing the sum.
← Wstecz