Słownik AI
Kompletny słownik sztucznej inteligencji
Random Decision Tree
Tree structure generated randomly where each node splits the feature space according to a random cut, creating partitions that progressively isolate observations.
Anomaly Score
Quantitative metric calculated from the path length in the tree, indicating the degree of abnormality of an observation where a high score corresponds to a high probability of being an anomaly.
Isolation Path
Number of splits needed from the root to the leaf containing an observation, where anomalies present significantly shorter paths than normal points.
Contamination Factor
Crucial parameter estimating the expected proportion of anomalies in the dataset, generally between 0.01 and 0.1, influencing the classification threshold.
Average Path Length
Theoretical expected value of the isolation path for unstructured data, used as reference to normalize anomaly scores in the final calculation.
Random Feature Split
Random selection of a feature and a split value at each node, avoiding biases related to feature distributions and favoring the isolation of anomalies.
Normality Score
Transformation of the anomaly score on a normalized scale, often between 0 and 1, facilitating interpretation and comparison between different models or datasets.
Point Anomaly
Individual observation that deviates significantly from the expected behavior of the data, easily identifiable by its short isolation path length in the algorithm.
Recursive partitioning
Iterative process of dividing the data space into progressively smaller sub-regions, creating a hierarchical structure that effectively isolates outlier observations.
Detection threshold
Cut-off value determined by the contamination factor separating normal observations from anomalies, calculated from the distribution of anomaly scores on the dataset.
Bounding box
Multidimensional hyper-rectangle created at each tree split, defining the partition's boundaries and allowing for efficient calculation of isolation paths.
Local Outlier Factor
Alternative anomaly detection metric based on local density, often compared to Isolation Forest to evaluate performance on different types of data distributions.
Tree pruning
Technique for limiting tree growth by stopping the split when nodes contain a single sample or reach the maximum depth, optimizing computation times.