Information Gain and Entropy - 인공지능 용어집

📖

용어

Shannon Entropy

Mathematical measure of uncertainty or disorder in a dataset, calculated as the sum of probabilities multiplied by their negative logarithm. Used as a splitting criterion to quantify the impurity of a node in decision trees.

📖

용어

Splitting Criterion

Mathematical rule used to determine the best attribute and split threshold at each node of a decision tree, based on maximizing information gain or minimizing impurity. Determines the structure and predictive efficiency of the final tree.

📖

용어

Conditional Entropy

Measure of the remaining uncertainty about a random variable Y when the value of another variable X is known, essential for calculating information gain. Represents the average entropy of the conditional distributions of Y given each value of X.

📖

용어

Information Ratio

Normalized variant of information gain dividing the latter by the intrinsic entropy of the splitting attribute to avoid bias towards attributes with many values. Compensates for the natural tendency of information gain to favor highly granular attributes.

📖

용어

MDL Principle

Minimum Description Length principle using information theory to balance model complexity and goodness of fit, penalizing splits that do not provide enough information relative to their descriptive cost. A regularized alternative to pure splitting criteria.

📖

용어

Entropy-Based Pruning

Post-pruning technique using entropy-based criteria to evaluate whether removing a branch improves the model's bias-variance tradeoff. Compares the potential information gain to the added complexity cost.

📖

용어

Joint Entropy

Measure of the total uncertainty of a system composed of several random variables simultaneously, fundamental for understanding relationships between attributes in decision tree construction. Used in the calculation of mutual information.

📖

용어

Gain Ratio

Modification of information gain normalized by the split entropy to correct the bias towards high-cardinality attributes, introduced in the C4.5 algorithm. Maintains the advantages of information gain while reducing its sensitivity to the number of values.

📖

용어

Relative Information Gain

Version normalisée du gain d'information exprimée comme proportion de l'entropie initiale, permettant la comparaison entre différents ensembles de données ou problèmes. Facilite l'interprétation et la benchmarking des performances de division.

📖

용어

Binary Splitting

Stratégie de division créant exactement deux nœuds enfants à chaque étape, simplifiant le calcul du gain d'information et réduisant la complexité structurelle de l'arbre. Optimise l'efficacité computationnelle tout en préservant la puissance expressive du modèle.

📖

용어

Multi-way Splitting

Approche de division créant autant de nœuds enfants qu'il y a de valeurs distinctes pour l'attribut sélectionné, maximisant potentiellement le gain d'information brut. Nécessite souvent des techniques de régularisation comme le gain ratio pour éviter le surapprentissage.

AI 용어집