Transformer Architecture
Attention Weight
Softmax-normalized scores that determine the relative importance of each element when computing attention. These weights are used to weight the linear combination of values.
← Quay lại