AI 용어집
인공지능 완전 사전
Masked token
Token in a textual sequence replaced by a special [MASK] symbol during training, forcing the model to learn to predict the original token.
BERT
Revolutionary Transformer architecture pre-trained using MLM to understand the bidirectional context of natural language.
RoBERTa
Optimized version of BERT eliminating Next Sentence Prediction and using dynamic masking with improved hyperparameters.
Bidirectional attention
Mechanism allowing each token to attend to both preceding and following tokens in the sequence, unlike unidirectional models.
Token embeddings
Dense vector representations of input tokens that capture their semantic and syntactic characteristics.
Dynamic masking
Masking strategy where masked tokens change at each training epoch, improving model robustness as in RoBERTa.
Whole Word Masking (WWM)
Advanced technique masking all sub-tokens of an entire word rather than random individual tokens.
Span masking
Strategy masking contiguous sequences of tokens of variable lengths, better mimicking natural linguistic phenomena.
Masking strategy
Set of rules determining which tokens to mask, with what probability, and how to replace them during MLM training.