Self-Attention
Softmax Normalization
Activation function transforming attention scores into probability distribution, ensuring that the sum of attention weights equals 1 for each position.
← Back