Słownik AI
Kompletny słownik sztucznej inteligencji
Linear Projection
Linear transformation applied to input embeddings to generate Query, Key and Value spaces in each multi-head attention head.
Attention Masking
Technique applying infinitely negative values to certain positions in the attention matrix to prevent unwanted interactions between sequence elements.
Multi-Head Concatenation
Operation combining the outputs of all attention heads by concatenating their representations before a final linear projection to produce the output.
Contextual Embedding
Enriched vector representation generated by the attention mechanism that incorporates contextual information from the entire sequence for each element.
Attention Head Dimension
Reduced dimensionality of each attention subspace in Multi-Head Attention, typically calculated as model_dimension / number_of_heads.
Parallel Attention Computation
Process where multiple attention heads are computed simultaneously in parallel, allowing efficient capture of different aspects of sequential relationships.
Residual Attention Connection
Residual connection adding the original input to the output of the attention layer, facilitating training of deep networks by preserving information flow.
Attention Distribution
Probability distribution over sequence elements generated by softmax, indicating where the model 'looks' when processing a specific element.