Tokenization and Encoding
Subword Tokenization
Tokenization strategy that divides words into smaller units (subwords), allowing management of a finite vocabulary while being able to represent infinite words, including neologisms and typos.
← Wstecz