BERT et ses Variantes
Funnel Transformers
Hierarchical architecture progressively reducing sequence length across layers while preserving important information. Significantly saves computational memory for long sequences.
← Indietro