KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Shannon Entropy
Mathematical measure of uncertainty or disorder in a dataset, calculated as the sum of probabilities multiplied by their negative logarithm. Used as a splitting criterion to quantify the impurity of a node in decision trees.
Splitting Criterion
Mathematical rule used to determine the best attribute and split threshold at each node of a decision tree, based on maximizing information gain or minimizing impurity. Determines the structure and predictive efficiency of the final tree.
Conditional Entropy
Measure of the remaining uncertainty about a random variable Y when the value of another variable X is known, essential for calculating information gain. Represents the average entropy of the conditional distributions of Y given each value of X.
Information Ratio
Normalized variant of information gain dividing the latter by the intrinsic entropy of the splitting attribute to avoid bias towards attributes with many values. Compensates for the natural tendency of information gain to favor highly granular attributes.
MDL Principle
Minimum Description Length principle using information theory to balance model complexity and goodness of fit, penalizing splits that do not provide enough information relative to their descriptive cost. A regularized alternative to pure splitting criteria.
Entropy-Based Pruning
Post-pruning technique using entropy-based criteria to evaluate whether removing a branch improves the model's bias-variance tradeoff. Compares the potential information gain to the added complexity cost.
Joint Entropy
Measure of the total uncertainty of a system composed of several random variables simultaneously, fundamental for understanding relationships between attributes in decision tree construction. Used in the calculation of mutual information.
Gain Ratio
Modification of information gain normalized by the split entropy to correct the bias towards high-cardinality attributes, introduced in the C4.5 algorithm. Maintains the advantages of information gain while reducing its sensitivity to the number of values.
Relative Information Gain
Version normalisée du gain d'information exprimée comme proportion de l'entropie initiale, permettant la comparaison entre différents ensembles de données ou problèmes. Facilite l'interprétation et la benchmarking des performances de division.
Binary Splitting
Stratégie de division créant exactement deux nœuds enfants à chaque étape, simplifiant le calcul du gain d'information et réduisant la complexité structurelle de l'arbre. Optimise l'efficacité computationnelle tout en préservant la puissance expressive du modèle.
Multi-way Splitting
Approche de division créant autant de nœuds enfants qu'il y a de valeurs distinctes pour l'attribut sélectionné, maximisant potentiellement le gain d'information brut. Nécessite souvent des techniques de régularisation comme le gain ratio pour éviter le surapprentissage.