Multi-Head Attention
Attention Head Dimension
Reduced dimensionality of each attention subspace in Multi-Head Attention, typically calculated as model_dimension / number_of_heads.
← 뒤로Reduced dimensionality of each attention subspace in Multi-Head Attention, typically calculated as model_dimension / number_of_heads.
← 뒤로