KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
k-Nearest Neighbors (k-NN)
Non-parametric supervised learning algorithm that classifies a new observation based on the majority class of its k nearest neighbors in the feature space.
Euclidean Distance
Standard distance measure in Euclidean space, calculated as the square root of the sum of squared differences between the coordinates of two points.
Manhattan Distance
Distance measure calculated as the sum of absolute values of the differences between the coordinates of two points, also called L1 distance or taxi distance.
Minkowski Distance
Generalized distance metric that includes Euclidean distance (p=2) and Manhattan distance (p=1) as special cases, defined by the p-th root of the sum of absolute differences raised to the power p.
Distance Weighting
Variant of majority voting where closer neighbors have more influence on the final classification, typically using the inverse of the distance as weight.
KD-Tree
Data structure that partitions k-dimensional space to accelerate nearest neighbor search, reducing complexity from O(n) to O(log n) on average.
Ball Tree
Hierarchical data structure that organizes points in nested spheres, efficient for nearest neighbor searches in high dimensions where KD-Trees become inefficient.
Curse of Dimensionality
Phenomenon where the performance of distance-based algorithms degrades in high dimensions because all distances tend to become equivalent, making the notion of 'closest' less meaningful.
Hyperparameter k
Number of neighbors to consider in the k-NN algorithm, crucial for the balance between bias and variance: a small k creates a complex model, a large k creates a smoother model.
Data Standardization
Essential preprocessing for k-NN where features are brought to the same scale to prevent variables with large value ranges from dominating the distance calculation.
k-NN for Regression
Variant of k-NN where the prediction is the average (or weighted average) of the values of the k nearest neighbors rather than a majority class vote.
Hamming Distance
Distance measure for binary categorical data, calculated as the number of positions where two vectors differ, used when features are binary or categorical.
Elbow Method
Technique for selecting optimal k by plotting error rate against k and choosing the point where improvement starts to decrease significantly (the 'elbow').
K-Fold Cross-Validation
Robust evaluation method for k-NN where data is divided into k subsets, allowing reliable performance estimation and helping to choose the optimal k.
Exhaustive Search
Naive approach to find the k nearest neighbors by calculating distance to all points in the dataset, with O(n) complexity per query.
Approximate Nearest Neighbor (ANN)
Family of algorithms that find approximately nearest neighbors with a trade-off between accuracy and speed, essential for large datasets.