Encoder-Decoder Architecture
Decoder Stack
Stacking of decoder layers that generate the output sequence token by token, integrating both masked self-attention and cross-attention to model temporal dependencies and input-output relationships.
← Zurück