KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
FastText
Extension of Word2Vec developed by Facebook that represents each word as the sum of its character n-gram vectors, allowing handling of out-of-vocabulary words and complex morphologies.
Contextual Embeddings
Dynamic vector representations whose values change according to the usage context of the word, unlike static embeddings that assign a unique vector per word.
Static Embeddings
Fixed vector representations where each word has a single vector representation independent of its context, as in classic Word2Vec or GloVe.
Skip-gram
Training architecture that predicts context words from a central word, excellent for capturing semantic relationships with small corpora.
CBOW
Continuous Bag of Words, model that predicts a central word from the sum of vectors of its context words, efficient for training on large corpora.
Subword Embeddings
Vector representation technique that decomposes words into smaller units (characters, morphemes) to handle open vocabulary and capture morphological information.
ELMo
Embeddings from Language Models, approach that generates contextual embeddings by combining hidden states of bidirectional LSTM networks pretrained on vast corpora.
Sentence Embeddings
Vector representations that encode entire sentences into unique vectors, capturing global meaning and semantic structure at the sentence level.
Doc2Vec
Extension of Word2Vec that generates embeddings for entire documents by introducing a document identifier as additional context during training.
Universal Sentence Encoder
Google model that transforms texts into high-dimensional embeddings, optimized for semantic similarity and text classification tasks.
RoBERTa
Robustly Optimized BERT Pretraining Approach, improved version of BERT with longer pre-training on more data and optimized hyperparameters.
Embedding Layer
First layer of NLP neural networks that transforms token indices into dense vectors, learning these representations during training.
Vector Space Model
Algebraic representation where words are points in a multidimensional space, allowing mathematical operations to measure semantic similarities.