Glosarium AI
Kamus lengkap Kecerdasan Buatan
Additive Attention
Proposed by Bahdanau, this method combines the decoder's hidden state and encoder outputs through a feed-forward network to calculate attention weights.
Multiplicative Attention
Introduced by Luong, calculates attention scores by dot product between the decoder state and encoder outputs, offering a more efficient implementation.
Multi-Head Attention
Extension of self-attention using multiple attention heads in parallel to capture different types of relationships in the data.
Context Vector
Weighted representation of encoder outputs, calculated using attention weights and provided to the decoder as contextual information.
Scaled Dot-Product Attention
Attention variant used in Transformers where the dot product is divided by the square root of the dimension to stabilize training.
Global Attention
Attention mechanism considering all positions of the source sequence to calculate the context vector at each decoding step.
Local Attention
Attention variant considering only a subset of predicted positions around a central position, reducing computational complexity.
Hierarchical Attention
Multi-level architecture applying attention at different granularities, first at the word level then at the sentence or document level.
Query, Key, Value
Triple of fundamental vectors in attention: Query represents the current request, Key the available keys, and Value the values to retrieve.
Temporal Attention
Mechanism specialized in capturing temporal dependencies in time series by weighting relevant time steps.
Spatial Attention
Application of attention to spatial data (images, videos) to focus on the most informative regions in space.
Adaptive Attention
Approach where the attention mechanism dynamically adjusts during training to optimize its parameters according to the task.
Sparse Attention
Attention variant that computes weights only for a subset of positions, enabling efficient processing of longer sequences.
Attention Mask
Technique that masks certain positions to prevent attention on irrelevant tokens such as padding or future tokens.
Linear Attention
Approximation of standard attention with linear complexity rather than quadratic, enabling processing of much longer sequences.
Performer Attention
Variant using feature mapping kernels to approximate attention with efficient linear complexity in memory and computation.