KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Decision Tree
Supervised predictive model that uses a tree-like structure to model decisions and their possible consequences through a series of tests on data features.
Root Node
Starting point of a decision tree that represents the complete set of training data and contains the first split based on the most discriminative feature.
Internal Node
Intermediate node in a decision tree that represents a test on a specific feature and divides the data into homogeneous subsets.
Leaf
Terminal node of a decision tree that represents a final decision or class prediction, with no further possible splitting.
Splitting Criterion
Quantitative method used to evaluate the quality of a split in a decision tree, aiming to maximize the homogeneity of the resulting subsets.
Entropy
Mathematical measure of disorder or uncertainty in a dataset, used to quantify the impurity of a node in decision trees.
Information Gain
Metric that measures the entropy reduction obtained by splitting a node according to a specific feature, used to select the best split.
Gini Index
Impurity measure ranging between 0 and 1, calculating the probability that a randomly chosen element is incorrectly classified, an alternative to entropy in decision trees.
Pruning
Technique for reducing the complexity of a decision tree by removing branches that provide little predictive power to prevent overfitting.
Overfitting
Phenomenon where a model learns excessively the details and noise of the training data at the expense of its ability to generalize on new data.
Tree Depth
Maximum number of splits from the root node to a leaf, a crucial parameter controlling the complexity and bias of the model.
CART
Classification and Regression Trees algorithm that builds binary trees using the Gini index as a splitting criterion for classification.
ID3
Pioneering decision tree algorithm using information gain as the splitting criterion, limited to categorical variables and binary splits.
C4.5
Improvement on the ID3 algorithm that uses the information gain ratio to avoid bias towards features with many values.
Target Variable
Variable to be predicted in a supervised learning problem, represented by the terminal leaves of the decision tree.
Decision Rule
Logical set of IF-THEN conditions extracted from a path in the decision tree, allowing for the interpretation and explanation of the model's predictions.
Variable Importance
Quantitative measure of each predictive feature's contribution to improving the purity of splits throughout the tree.
Complexity Cost
Pruning parameter that penalizes tree size, balancing data fit and model simplicity to optimize generalization.