Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
QKV Representation
Projection of input embeddings into three distinct vector spaces: Query, Key, and Value, used respectively to compute attention scores, weight the weights, and generate the output.
Attention Mask
Binary or continuous matrix applied to attention scores to control which tokens can attend to others, crucial in decoder models to prevent cheating on future tokens.
Attention Softmax
Application of the softmax function to attention scores to normalize weights into a probability distribution, ensuring that the sum of weights for each query position equals 1.
Causal Bias
Constraint imposed in auto-regressive models where each position can only attend to present and past positions, masking future positions during training.
Output Projection
Final linear transformation applied to the attention output to map the concatenated dimension of attention heads to the expected dimension for subsequent layers.