Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Ordinal Encoding
A variant of label encoding that preserves the natural order between categories by assigning integers according to their hierarchical rank, ideal for variables with an intrinsic order relationship.
Binary Encoding
Technique that first converts categories to integers via label encoding, then to binary representation, significantly reducing the number of columns compared to one-hot encoding.
Frequency Encoding
Method replacing each category with its frequency of occurrence in the dataset, capturing the relative importance of each category without creating new dimensions.
Hashing Encoding
Approach using a hash function to map categories to a fixed number of dimensions, allowing efficient handling of high cardinalities with constant memory.
Base-N Encoding
Extension of binary encoding using different numerical bases (base-3, base-4, etc.) to represent categories, offering a compromise between dimensionality and representation capacity.
Leave-One-Out Encoding
Target encoding variant calculating the target mean for each observation by excluding that specific observation, reducing the risk of overfitting and information leakage.
Weight of Evidence (WoE) Encoding
Credit scoring specific technique calculating the logarithm of the ratio between the proportion of good and bad payers per category, particularly effective for linear models.
CatBoost Encoding
Ordered encoding method using target statistics calculated sequentially with smoothing to avoid overfitting, natively implemented in the CatBoost algorithm.
Count Encoding
Simple technique replacing each category with the number of occurrences in the dataset, similar to frequency encoding but using raw counts rather than proportions.
Helmert Encoding
Contrast encoding method comparing each level of a categorical variable to the mean of subsequent levels, useful for linear models with ordinal variables.
Sum Encoding
Variant of contrast encoding where each category is compared to the global mean, with a reference contrast representing the average effect of all categories.
Backward Difference Encoding
Contrast encoding technique comparing each level of a categorical variable to the previous level, particularly suited for variables with natural progression.
M-Estimate Encoding
Regularized version of target encoding using an m parameter to weight between the global mean and conditional mean, controlling the bias-variance tradeoff.
James-Stein Encoding
Shrinkage encoding method applying the James-Stein principle to combine category means with the global mean, optimizing mean squared error.
Embedding Encoding
Modern approach using neural networks to learn dense vector representations of categories, automatically capturing semantic relationships between them.
Polynomial Encoding
Contrast encoding method generating orthogonal polynomial terms to represent non-linear effects of categorical variables in regression models.