BERT et ses Variantes
ELECTRA
Efficient pre-training architecture replacing masked language modeling with corrupted token replacement. Uses a discriminator that identifies replaced tokens, enabling faster and more effective training.
← Indietro