Attention Masks - Glosarium AI

📖

istilah

Binary Mask

Matrix containing only 0 and 1 values where 1 indicates positions to keep and 0 those to mask, generally applied through element-wise multiplication before or after the attention softmax.

📖

istilah

Triangular Causal Mask

Triangular matrix structure where elements above the diagonal are masked, creating strict temporal dependency in transformer models for sequential tasks.

📖

istilah

Variable Length Mask

Dynamic mask that adapts to variable sequence lengths in a batch, optimizing computation by ignoring irrelevant positions while preserving batch alignment.

📖

istilah

Key Padding Mask

Specific mask applied to keys in the attention mechanism to prevent padding tokens from influencing attention scores, typically added before the softmax operation.

📖

istilah

Query Mask

Mask applied to queries to restrict which positions can perform attention queries, used in specialized architectures requiring granular control of interactions.

📖

istilah

Value Mask

Mask applied to values after attention computation to filter out undesirable contributions, enabling fine post-attention control of output representations.

📖

istilah

Attention Weight Masking

Technique consisting of applying a mask directly to attention weights after softmax to force certain contributions to zero, offering explicit control over information pathways.

📖

istilah

Softmax Mask

Mask applied by adding a large negative value (typically -inf) to attention scores before softmax, ensuring that masked positions receive a probability close to zero.

📖

istilah

Logit Mask

Masque appliqué au niveau des logits (scores d'attention bruts) pour exclure certaines interactions avant la normalisation softmax, préservant la distribution mathématique des scores valides.

Glosarium AI

Binary Mask

Triangular Causal Mask

Variable Length Mask

Key Padding Mask

Query Mask

Value Mask

Attention Weight Masking

Softmax Mask

Logit Mask

Tidak ada hasil ditemukan