Encoder-Decoder Architecture
Causal Masking
Technique in the decoder that masks all future positions to ensure that the prediction for position i only depends on positions 1 to i, respecting the auto-regressive nature of generation.
← Geri