Categorical Variable Encoding

📖

termini

Ordinal Encoding

A variant of label encoding that preserves the natural order between categories by assigning integers according to their hierarchical rank, ideal for variables with an intrinsic order relationship.

📖

termini

Binary Encoding

Technique that first converts categories to integers via label encoding, then to binary representation, significantly reducing the number of columns compared to one-hot encoding.

📖

termini

Frequency Encoding

Method replacing each category with its frequency of occurrence in the dataset, capturing the relative importance of each category without creating new dimensions.

📖

termini

Hashing Encoding

Approach using a hash function to map categories to a fixed number of dimensions, allowing efficient handling of high cardinalities with constant memory.

📖

termini

Base-N Encoding

Extension of binary encoding using different numerical bases (base-3, base-4, etc.) to represent categories, offering a compromise between dimensionality and representation capacity.

📖

termini

Leave-One-Out Encoding

Target encoding variant calculating the target mean for each observation by excluding that specific observation, reducing the risk of overfitting and information leakage.

📖

termini

Weight of Evidence (WoE) Encoding

Credit scoring specific technique calculating the logarithm of the ratio between the proportion of good and bad payers per category, particularly effective for linear models.

📖

termini

CatBoost Encoding

Ordered encoding method using target statistics calculated sequentially with smoothing to avoid overfitting, natively implemented in the CatBoost algorithm.

📖

termini

Count Encoding

Simple technique replacing each category with the number of occurrences in the dataset, similar to frequency encoding but using raw counts rather than proportions.

📖

termini

Helmert Encoding

Contrast encoding method comparing each level of a categorical variable to the mean of subsequent levels, useful for linear models with ordinal variables.

📖

termini

Sum Encoding

Variant of contrast encoding where each category is compared to the global mean, with a reference contrast representing the average effect of all categories.

📖

termini

Backward Difference Encoding

Contrast encoding technique comparing each level of a categorical variable to the previous level, particularly suited for variables with natural progression.

📖

termini

M-Estimate Encoding

Regularized version of target encoding using an m parameter to weight between the global mean and conditional mean, controlling the bias-variance tradeoff.

📖

termini

James-Stein Encoding

Shrinkage encoding method applying the James-Stein principle to combine category means with the global mean, optimizing mean squared error.

📖

termini

Embedding Encoding

Modern approach using neural networks to learn dense vector representations of categories, automatically capturing semantic relationships between them.

📖

termini

Polynomial Encoding

Contrast encoding method generating orthogonal polynomial terms to represent non-linear effects of categorical variables in regression models.

Glossario IA

Ordinal Encoding

Binary Encoding

Frequency Encoding

Hashing Encoding

Base-N Encoding

Leave-One-Out Encoding

Weight of Evidence (WoE) Encoding

CatBoost Encoding

Count Encoding

Helmert Encoding

Sum Encoding

Backward Difference Encoding

M-Estimate Encoding

James-Stein Encoding

Embedding Encoding

Polynomial Encoding

Nessun risultato trovato