K-Fold Cross-Validation

📖

Begriffe

K-Fold Cross-Validation

Model evaluation technique that divides the dataset into K equal partitions, where each partition serves in turn as the test set while the other K-1 serve as training. This method provides a more robust estimate of model performance by reducing evaluation variance.

📖

Begriffe

Stratified K-Fold Cross-Validation

Variant of K-Fold that maintains the class distribution in each partition, essential for imbalanced datasets. This approach ensures that each fold faithfully represents the overall class distribution of the original dataset.

📖

Begriffe

Holdout Method

Simple evaluation method dividing the dataset into two distinct sets: training and test, typically with ratios of 70/30 or 80/20. Although quick to implement, this method can produce biased performance estimates depending on how the data is partitioned.

📖

Begriffe

Repeated Cross-Validation

Technique repeating the K-Fold process multiple times with different random partitions to reduce performance estimation variance. This approach combines the advantages of K-Fold with greater statistical robustness at the cost of increased computational expense.

📖

Begriffe

Bootstrap Validation

Evaluation method using sampling with replacement to create multiple training and test sets from the original data. Bootstrap allows estimating the variance of model performance and is particularly useful with small datasets.

📖

Begriffe

Grid Search with Cross-Validation

Systematic optimization technique exhaustively testing all specified hyperparameter combinations using cross-validation to evaluate each configuration. This method ensures finding the best combination within the defined grid but can be very computationally expensive.

📖

Begriffe

Randomized Search with Cross-Validation

Alternative to Grid Search that randomly samples a fixed number of hyperparameter combinations rather than exhaustively exploring all possibilities. This approach is often more efficient for finding good hyperparameters with fewer evaluations than Grid Search.

📖

Begriffe

Learning Curve

Graph showing the evolution of model performance as a function of training set size, used to diagnose overfitting or underfitting. Learning curves help determine whether additional data could improve model performance.

📖

Begriffe

Validation Curve

Diagnostic tool visualizing the impact of a single hyperparameter on training and validation performance. Validation curves help identify optimal hyperparameter values and detect bias-variance issues.

📖

Begriffe

Cross-Entropy

Loss function measuring the divergence between two probability distributions, widely used in classification problems. Cross-entropy penalizes incorrect predictions more heavily when they are confident, making it an excellent training metric.

📖

Begriffe

Mean Squared Error

Evaluation metric calculating the average of squared differences between predicted and actual values, particularly sensitive to large errors. MSE is commonly used for regression problems and penalizes significant errors more than MAE.

📖

Begriffe

Mean Absolute Error

Regression metric measuring the average of absolute values of errors between predictions and actual values, offering direct interpretation in target variable units. Unlike MSE, MAE is less sensitive to outliers and represents the mean absolute error.

📖

Begriffe

R² Score

Coefficient of determination measuring the proportion of target variable variance explained by the model, ranging from -∞ to 1. An R² of 1 indicates perfect prediction, while negative values suggest the model performs worse than a simple average.

📖

Begriffe

F1-Score

Classification metric calculating the harmonic mean of precision and recall, particularly useful for imbalanced datasets. The F1-Score balances the model's ability to avoid false positives and false negatives in a single measure.

📖

Begriffe

Precision-Recall Curve

Graph illustrating the trade-off between precision and recall for different classification thresholds, essential for evaluating models on imbalanced data. The area under this curve (AUC-PR) provides an aggregated performance measure independent of threshold.

📖

Begriffe

ROC Curve

Curve representing the true positive rate against the false positive rate at various decision thresholds, visualizing the model's discrimination capability. The ROC curve and its area (AUC-ROC) are standards for evaluating overall binary classifier performance.

📖

Begriffe

AUC Score

Area under the ROC curve measuring the probability that a classifier gives a higher score to a random positive instance than to a negative one. AUC provides a threshold-independent performance measure, particularly useful for comparing different models.

📖

Begriffe

Group K-Fold Cross-Validation

A variant of K-Fold that ensures the same groups never appear in different training and test sets simultaneously. This approach is crucial when data has a group structure (patients, users) where observations from the same group are correlated.

KI-Glossar