YZ Sözlüğü
Yapay Zekanın tam sözlüğü
DistilBERT (Distilled BERT)
Lightweight version of BERT created through knowledge distillation, maintaining 97% of BERT base performance with only 40% of the parameters for faster inference.
Positional Embeddings
Vectors added to token embeddings in BERT to encode sequential position, essential since attention alone does not capture token order.
BERT-base vs BERT-large
Two main BERT configurations: base (12 layers, 768 hidden dimensions, 110M parameters) and large (24 layers, 1024 dimensions, 340M parameters) for different performance/resource trade-offs.
Fine-tuning BERT
Process of adapting pre-trained BERT weights to specific tasks by adding classification layers and training on labeled target task data.
[SEP] Token
Special token used in BERT to separate different text segments (like sentence pairs in QA or NSP tasks), marking boundaries between segments.
Pre-training Objectives
Self-supervised tasks (MLM and NSP) used to pre-train BERT on large unlabeled corpora, enabling learning of general linguistic representations.
Transformer Encoder Stack
BERT's fundamental architecture composed of multiple Transformer encoder layers, each with multi-head attention mechanisms and feed-forward networks.
Domain-specific BERT
BERT variants pre-trained on specialized corpora (BioBERT for biomedical, SciBERT for scientific, FinBERT for financial) for better performance in these domains.
Multilingual BERT (mBERT)
Version of BERT pre-trained on 104 languages with a shared vocabulary, capable of understanding and processing text in multiple languages with a single model.
BERTology
Research field dedicated to the analysis, interpretation and improvement of BERT-type models, studying their internal behaviors and linguistic capabilities.