AI-ordlista
Den kompletta ordlistan över AI
Concatenation and Linear Projection
Final step of multi-head attention where the outputs of all heads are concatenated then linearly projected to restore the model dimension, thus merging information from different subspaces.
Causal Attention (Masked Self-Attention)
Type of self-attention used in decoders where masked attention is applied to prevent a token from attending to future tokens, ensuring the auto-regressive nature of the model.
Head Dimension (d_k)
Dimension of key and value vectors in each attention head, calculated by dividing the model dimension by the number of heads, influencing the representational capacity of each head.
Linearized Attention
Family of attention mechanisms that rewrite the attention calculation to avoid materializing the full attention matrix, allowing linear complexity relative to the sequence length.