KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Cross-Lingual Transfer
Ability of an NER model trained on a source language to apply its knowledge to recognize entities in a target language, without requiring annotated data for the latter.
Unified Multilingual Model
NER architecture where a single model is trained simultaneously on data from multiple languages, sharing vector representations to capture universal entity recognition patterns.
Vector Space Alignment
Technique aiming to project the semantic spaces of different languages into a common vector space, thus enabling a model to process and compare words or entities from distinct languages.
Multilingual Fine-Tuning
Process of adapting a pre-trained language model on vast multilingual corpora, specializing it for the NER task using an annotated dataset in multiple languages.
Code-Switching NER
Challenge of multilingual NER involving recognizing entities within a text where speakers alternate between multiple languages, often within the same sentence.
Translingual Entities
Named entities that maintain an identical form or reference across multiple languages, such as brand names (Google), organizations (UN), or people (Barack Obama).
Multilingual Domain Adaptation
Technique for adjusting a multilingual NER model to a specific domain (medical, legal) using unannotated or weakly annotated data in multiple languages.
Multilingual Character Embeddings
Vector representations at the character level, shared between languages, that allow the model to capture similar morphologies (e.g., Latin roots) and generalize to new words.
Projected Annotation
Method for creating NER training data in a target language using a machine translation system to project entity labels from an annotated source language.
Low-Resource NER Models
NER systems designed to work with very limited amounts of annotated data in one or more target languages, often through transfer learning from high-resource languages.
Multilingual Entity Normalization
Task of grouping different linguistic or orthographic variants of the same entity (e.g., 'New York', 'Nueva York', 'New York City') under a single canonical identifier.
Multilingual Evaluation
Process of measuring the performance of an NER system on a diverse set of languages, often using standard metrics (precision, recall, F1-score) calculated per language and in aggregate.
Large-Scale Multilingual Language Models (mLLM)
Foundation models like mBERT or XLM-R, pre-trained on hundreds of languages, which serve as a basis for building high-performing multilingual NER systems through fine-tuning.
Language Detection for NER
Crucial preliminary step in multilingual NER pipelines consisting of identifying the language of the input text to activate the appropriate entity recognition model.
Script-Independent NER
Ability of an NER model to recognize entities independently of the writing system (Latin alphabet, Cyrillic, Arabic, etc.), relying on abstract language representations.
Back-Translation for NER
Data augmentation technique where an annotated text in a source language is translated to a target language, then back-translated to the source language, to create robust new training examples.