Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Text classification
NLP task consisting of automatically assigning a textual document to one or more predefined categories based on its semantic content.
Binary classification
Type of classification where the model must choose between two mutually exclusive classes, usually represented as positive/negative or 0/1.
Multi-class classification
Classification problem where each instance must be assigned to exactly one class among three or more, with mutually exclusive classes.
Multi-label classification
Variant of classification where a document can be simultaneously associated with multiple non-exclusive labels or categories.
Naive Bayes
Probabilistic classification algorithm based on Bayes' theorem with a conditional independence assumption between features.
SVM (Support Vector Machine)
Supervised learning algorithm that finds the optimal hyperplane separating classes in high-dimensional space by maximizing the margin.
Bag-of-Words
Text representation that counts word occurrences without considering their order or grammatical context.
TF-IDF
Statistical metric evaluating the importance of a word in a document relative to a corpus, combining term frequency and inverse document frequency.
Word Embeddings
Dense vector representations of words in a continuous space where semantic distances between words are preserved.
Transformers
Neural network architecture based on attention mechanisms that allows capturing long-range dependencies in sequences.
Confusion Matrix
A table for visualizing classifier performance by comparing predictions to true labels by class.
Cross-validation
Robust evaluation technique dividing data into subsets to train and test the model multiple times on different partitions.
Precision
Metric measuring the proportion of correct positive predictions among all positive predictions made by the model.
Recall
Metric evaluating the model's ability to correctly identify all actual positive instances in the dataset.
F1 Score
Harmonic mean of precision and recall, providing a single balanced measure of classification performance.
Overfitting
Phenomenon where the model learns training data too specifically and poorly generalizes to new unseen data.
Tokenization
Process of segmenting text into elementary units (tokens) such as words, subwords, or characters for analysis.
Stemming
Text normalization technique that reduces words to their morphological root by removing suffixes and prefixes.
Lemmatization
Linguistic process that reduces words to their canonical form (lemma) using morphological analysis and a dictionary.