KI-Glossar

Extension of attention where multiple heads compute attention representations in parallel on different projected subspaces, allowing the model to focus on various aspects of the sequence.

📖

Begriffe

Decoder-Only

Transformer architecture consisting exclusively of decoder blocks with causal masking, optimized for autoregressive language modeling and generation tasks.

📖

Begriffe

Probability Density Modeling

Fundamental objective of language models that learn to estimate the conditional probability P(token_t | tokens_<t) for each position in a sequence.

🔍

KI-Glossar

KV Cache

RLHF (Reinforcement Learning from Human Feedback)

Multi-head Attention Mechanism

Decoder-Only

Probability Density Modeling

Keine Ergebnisse gefunden