Encoder-Decoder Architecture
Output Projection
Final linear layer that projects the decoder representations into the vocabulary space, followed by a softmax to produce a probability distribution over possible tokens at each output position.
← Indietro