AI Glossary

The complete dictionary of Artificial Intelligence

162

Encoder Stack

Stack of identical layers transforming the input sequence into rich and contextual representations, each layer containing attention and feed-forward components.

📖

terms

Decoder Stack

Architecture composed of layers generating the output sequence, using masked attention to prevent future information leakage and cross-attention with the encoder.

📖

terms

Encoder-Decoder Attention

Mechanism allowing the decoder to access and focus on encoder representations to generate each output token in an informed manner.

📖

terms

Layer Normalization

Training stabilization technique normalizing activations for each position, applied before or after sub-layers in the transformer architecture.

📖

terms

Masked Self-Attention

Variant of self-attention used in decoders where future positions are masked to prevent the use of information not available during generation.

📖

terms

Scaled Dot-Product Attention

Attention calculation normalizing dot products by the square root of the key dimension to stabilize gradients during training.

📖

terms

Attention Heads

Independent subspaces in multi-head attention, each learning to focus on different types of relationships and patterns in the data.

📖

terms

Token Embedding

Dense and continuous vector representation of each input token, the starting point of the transformer architecture before adding positional information.

🔍

AI Glossary

Encoder Stack

Decoder Stack

Encoder-Decoder Attention

Layer Normalization

Masked Self-Attention

Scaled Dot-Product Attention

Attention Heads

Token Embedding

No results found