KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Information Gain
Quantitative metric measuring the reduction in entropy obtained by partitioning a dataset according to a specific attribute, used by ID3 to select the optimal splitting attribute at each node of the tree.
Shannon Entropy
Mathematical measure of uncertainty or disorder in a dataset, calculated as the negative sum of probabilities multiplied by their binary logarithm, serving as the basis for calculating information gain in ID3.
Splitting Attribute
Variable selected at a given node to partition the dataset into more homogeneous subsets, chosen by ID3 based on the maximum information gain among all available attributes.
Leaf Node
Terminal node of the ID3 decision tree containing no further subdivisions, representing a final decision or a classification based on the majority class of the samples contained in this node.
Information Gain Ratio
Variant of information gain normalized by the intrinsic entropy of the attribute, introduced to correct the bias of ID3 towards attributes with many possible values.
Training Set
Subset of data used by ID3 to build the decision tree, containing labeled examples that allow the algorithm to learn the relationships between attributes and target classes.
Class Prediction
Classification process in ID3 where a new sample traverses the tree from the root to a leaf, with the predicted class being the one associated with the reached leaf node according to the successive attribute tests.
Tree Depth
Maximum number of branches traversed from the root to any leaf in the ID3 tree, directly influencing the model's complexity and its ability to capture patterns in the data.
Purity criterion
Measure of class homogeneity in a node, where a perfectly pure node contains samples from a single class, serving as the basis for evaluating partition quality in ID3.
Explanatory variables
Set of attributes used by ID3 to build the decision tree, each being evaluated for its splitting potential based on its ability to reduce uncertainty about the target variable.