k-Nearest Neighbors (k-NN)

📖

Begriffe

k-Nearest Neighbors (k-NN)

Non-parametric supervised learning algorithm that classifies a new observation based on the majority class of its k nearest neighbors in the feature space.

📖

Begriffe

Euclidean Distance

Standard distance measure in Euclidean space, calculated as the square root of the sum of squared differences between the coordinates of two points.

📖

Begriffe

Manhattan Distance

Distance measure calculated as the sum of absolute values of the differences between the coordinates of two points, also called L1 distance or taxi distance.

📖

Begriffe

Minkowski Distance

Generalized distance metric that includes Euclidean distance (p=2) and Manhattan distance (p=1) as special cases, defined by the p-th root of the sum of absolute differences raised to the power p.

📖

Begriffe

Distance Weighting

Variant of majority voting where closer neighbors have more influence on the final classification, typically using the inverse of the distance as weight.

📖

Begriffe

KD-Tree

Data structure that partitions k-dimensional space to accelerate nearest neighbor search, reducing complexity from O(n) to O(log n) on average.

📖

Begriffe

Ball Tree

Hierarchical data structure that organizes points in nested spheres, efficient for nearest neighbor searches in high dimensions where KD-Trees become inefficient.

📖

Begriffe

Curse of Dimensionality

Phenomenon where the performance of distance-based algorithms degrades in high dimensions because all distances tend to become equivalent, making the notion of 'closest' less meaningful.

📖

Begriffe

Hyperparameter k

Number of neighbors to consider in the k-NN algorithm, crucial for the balance between bias and variance: a small k creates a complex model, a large k creates a smoother model.

📖

Begriffe

Data Standardization

Essential preprocessing for k-NN where features are brought to the same scale to prevent variables with large value ranges from dominating the distance calculation.

📖

Begriffe

k-NN for Regression

Variant of k-NN where the prediction is the average (or weighted average) of the values of the k nearest neighbors rather than a majority class vote.

📖

Begriffe

Hamming Distance

Distance measure for binary categorical data, calculated as the number of positions where two vectors differ, used when features are binary or categorical.

📖

Begriffe

Elbow Method

Technique for selecting optimal k by plotting error rate against k and choosing the point where improvement starts to decrease significantly (the 'elbow').

📖

Begriffe

K-Fold Cross-Validation

Robust evaluation method for k-NN where data is divided into k subsets, allowing reliable performance estimation and helping to choose the optimal k.

📖

Begriffe

Exhaustive Search

Naive approach to find the k nearest neighbors by calculating distance to all points in the dataset, with O(n) complexity per query.

📖

Begriffe

Approximate Nearest Neighbor (ANN)

Family of algorithms that find approximately nearest neighbors with a trade-off between accuracy and speed, essential for large datasets.

KI-Glossar