Information Gain and Entropy

📖

terms

Shannon Entropy

Mathematical measure of uncertainty or disorder in a dataset, calculated as the sum of probabilities multiplied by their negative logarithm. Used as a splitting criterion to quantify the impurity of a node in decision trees.

📖

terms

Splitting Criterion

Mathematical rule used to determine the best attribute and split threshold at each node of a decision tree, based on maximizing information gain or minimizing impurity. Determines the structure and predictive efficiency of the final tree.

📖

terms

Conditional Entropy

Measure of the remaining uncertainty about a random variable Y when the value of another variable X is known, essential for calculating information gain. Represents the average entropy of the conditional distributions of Y given each value of X.

📖

terms

Information Ratio

Normalized variant of information gain dividing the latter by the intrinsic entropy of the splitting attribute to avoid bias towards attributes with many values. Compensates for the natural tendency of information gain to favor highly granular attributes.

📖

terms

MDL Principle

Minimum Description Length principle using information theory to balance model complexity and goodness of fit, penalizing splits that do not provide enough information relative to their descriptive cost. A regularized alternative to pure splitting criteria.

📖

terms

Entropy-Based Pruning

Post-pruning technique using entropy-based criteria to evaluate whether removing a branch improves the model's bias-variance tradeoff. Compares the potential information gain to the added complexity cost.

📖

terms

Joint Entropy

Measure of the total uncertainty of a system composed of several random variables simultaneously, fundamental for understanding relationships between attributes in decision tree construction. Used in the calculation of mutual information.

📖

terms

Gain Ratio

Modification of information gain normalized by the split entropy to correct the bias towards high-cardinality attributes, introduced in the C4.5 algorithm. Maintains the advantages of information gain while reducing its sensitivity to the number of values.

📖

terms

Relative Information Gain

Version normalisée du gain d'information exprimée comme proportion de l'entropie initiale, permettant la comparaison entre différents ensembles de données ou problèmes. Facilite l'interprétation et la benchmarking des performances de division.

📖

terms

Binary Splitting

Stratégie de division créant exactement deux nœuds enfants à chaque étape, simplifiant le calcul du gain d'information et réduisant la complexité structurelle de l'arbre. Optimise l'efficacité computationnelle tout en préservant la puissance expressive du modèle.

📖

terms

Multi-way Splitting

Approche de division créant autant de nœuds enfants qu'il y a de valeurs distinctes pour l'attribut sélectionné, maximisant potentiellement le gain d'information brut. Nécessite souvent des techniques de régularisation comme le gain ratio pour éviter le surapprentissage.

AI Glossary

Shannon Entropy

Splitting Criterion

Conditional Entropy

Information Ratio

MDL Principle

Entropy-Based Pruning

Joint Entropy

Gain Ratio

Relative Information Gain

Binary Splitting

Multi-way Splitting

No results found