Multi-Head Attention
Attention Head Dimension
Reduced dimensionality of each attention subspace in Multi-Head Attention, typically calculated as model_dimension / number_of_heads.
← KembaliReduced dimensionality of each attention subspace in Multi-Head Attention, typically calculated as model_dimension / number_of_heads.
← Kembali