Glosarium AI
Kamus lengkap Kecerdasan Buatan
Feature Importance
Metric quantifying the influence of each predictive variable in the performance of a Random Forest model, calculated either by average impurity reduction or by random permutation.
Gini Importance
Method for evaluating variable importance based on the total decrease in Gini impurity accumulated across all nodes where the variable is used to split.
Mean Decrease Impurity
Technique measuring the importance of a variable by the average impurity reduction (Gini or entropy) it provides when used as a splitting criterion in trees.
Permutation Importance
Model-agnostic method evaluating the importance of a variable by measuring the degradation in model performance when the values of this variable are randomly permuted.
Mean Decrease Accuracy
Indicator of a variable's importance based on the average decrease in model accuracy when this variable is permuted in the out-of-bag data.
Impurity Measure
Mathematical function quantifying the degree of class heterogeneity in a node, used to optimize splits in decision trees.
Information Gain
Splitting criterion measuring the reduction in entropy obtained by partitioning a node according to a specific feature, favoring splits that maximize the resulting homogeneity.
Gini Index
Impurity measure calculating the probability that a randomly classified observation would be incorrect, evaluating class heterogeneity in a decision tree node.
Out-of-Bag Error
Unbiased error estimate calculated by evaluating each tree on observations not used during its training, serving as internal cross-validation in Random Forest.
Feature Selection
Process of identifying and keeping the most relevant variables based on their importance scores, eliminating redundant or non-informative features.
Variable Importance Plot
Visualization ordering predictive variables by their decreasing importance score, facilitating the interpretation of the model's most influential factors.
Partial Dependence Plot
Graphical representation showing the marginal effect of one or two variables on the model's prediction, averaging over all other variables.
Node Impurity
Degree of heterogeneity of observations in a tree node, serving as the basis for calculating feature importance through their contribution to reducing this impurity.
Split Criterion
Rule determining the optimal division of a node based on a feature and a threshold, directly impacting the distribution of importance among variables.