Attention Mechanism - KI-Glossar

📖

Begriffe

Additive Attention

Proposed by Bahdanau, this method combines the decoder's hidden state and encoder outputs through a feed-forward network to calculate attention weights.

📖

Begriffe

Multiplicative Attention

Introduced by Luong, calculates attention scores by dot product between the decoder state and encoder outputs, offering a more efficient implementation.

📖

Begriffe

Multi-Head Attention

Extension of self-attention using multiple attention heads in parallel to capture different types of relationships in the data.

📖

Begriffe

Context Vector

Weighted representation of encoder outputs, calculated using attention weights and provided to the decoder as contextual information.

📖

Begriffe

Scaled Dot-Product Attention

Attention variant used in Transformers where the dot product is divided by the square root of the dimension to stabilize training.

📖

Begriffe

Global Attention

Attention mechanism considering all positions of the source sequence to calculate the context vector at each decoding step.

📖

Begriffe

Local Attention

Attention variant considering only a subset of predicted positions around a central position, reducing computational complexity.

📖

Begriffe

Hierarchical Attention

Multi-level architecture applying attention at different granularities, first at the word level then at the sentence or document level.

📖

Begriffe

Query, Key, Value

Triple of fundamental vectors in attention: Query represents the current request, Key the available keys, and Value the values to retrieve.

📖

Begriffe

Temporal Attention

Mechanism specialized in capturing temporal dependencies in time series by weighting relevant time steps.

📖

Begriffe

Spatial Attention

Application of attention to spatial data (images, videos) to focus on the most informative regions in space.

📖

Begriffe

Adaptive Attention

Approach where the attention mechanism dynamically adjusts during training to optimize its parameters according to the task.

📖

Begriffe

Sparse Attention

Attention variant that computes weights only for a subset of positions, enabling efficient processing of longer sequences.

📖

Begriffe

Attention Mask

Technique that masks certain positions to prevent attention on irrelevant tokens such as padding or future tokens.

📖

Begriffe

Linear Attention

Approximation of standard attention with linear complexity rather than quadratic, enabling processing of much longer sequences.

📖

Begriffe

Performer Attention

Variant using feature mapping kernels to approximate attention with efficient linear complexity in memory and computation.

KI-Glossar

Additive Attention

Multiplicative Attention

Multi-Head Attention

Context Vector

Scaled Dot-Product Attention

Global Attention

Local Attention

Hierarchical Attention

Query, Key, Value

Temporal Attention

Spatial Attention

Adaptive Attention

Sparse Attention

Attention Mask

Linear Attention

Performer Attention

Keine Ergebnisse gefunden