Transformer Architecture
Masked Attention
Attention mechanism where certain positions are masked to prevent the model from accessing future information. Essential in decoders to ensure autoregressive generation during inference.
← Indietro