Multi-Head Attention
Parallel Attention Computation
Process where multiple attention heads are computed simultaneously in parallel, allowing efficient capture of different aspects of sequential relationships.
← Zurück