AI-woordenlijst
Het complete woordenboek van kunstmatige intelligentie
Nested Cross-Validation
Model evaluation technique using two nested cross-validation loops to prevent overfitting during hyperparameter optimization. The inner loop selects the best hyperparameters while the outer loop evaluates the performance of the selected model in an unbiased manner.
Inner Loop
First level of cross-validation in nested cross-validation, responsible for selecting and optimizing model hyperparameters. This loop uses a separate validation set to identify the optimal configuration before final evaluation.
Outer Loop
Second level of cross-validation in nested cross-validation, providing an unbiased estimate of model performance after hyperparameter selection. The test data from this loop is never used during hyperparameter optimization.
Hyperparameter Overfitting
Phenomenon where hyperparameters are optimized to perform specifically on the validation set, compromising generalization to new data. This problem occurs when the same cross-validation is used for both hyperparameter selection and final evaluation.
Selection Bias
Systematic error introduced during model or hyperparameter selection when the test set is implicitly used in the optimization process. This bias leads to an optimistic and unrealistic estimate of model performance in production.
Nested Grid Search
Method combining nested cross-validation with exhaustive hyperparameter search on a predefined grid. Each grid configuration is evaluated by the inner loop before the best one is tested by the outer loop.
Estimated Generalization Error
Performance measure obtained by the outer loop of nested cross-validation, representing an approximation of model error on unseen data. This estimate is considered more reliable than that obtained by simple cross-validation.
Sequential Optimization
Process where hyperparameter selection and model evaluation are performed sequentially but on separate datasets to avoid contamination. This approach is fundamentally implemented in nested cross-validation.
Nested Cross-Validation
Extension of nested cross-validation adding a third level for selection between different model families. Each level uses disjoint data to ensure a completely unbiased evaluation of the entire pipeline.
Temporal Information Leakage
Specific problem to serial data where nested cross-validation is essential to maintain chronological order between training, validation, and test sets. This approach prevents the use of future information in optimization.
Selection Stability
Ability of nested cross-validation to identify robust hyperparameters that perform consistently across different outer validation folds. Low stability indicates strong dependence on specific training data.
Quadratic Computational Cost
Algorithmic complexity of nested cross-validation, requiring O(k²) trainings where k is the number of folds. This high cost is the necessary compromise to obtain an unbiased evaluation of model performance.
Nested Monte Carlo Cross-Validation
Variant of nested cross-validation using random sampling with replacement for both inner and outer loops. This approach reduces correlation between estimates while maintaining the impartiality of evaluation.
Evaluation Pipelining
Software architecture where nested cross-validation is implemented as a complete pipeline integrating preprocessing, feature selection, hyperparameter optimization, and final evaluation. This structure guarantees reproducibility and absence of data leakage.
Nested Confidence Intervals
Statistical method using the results of the outer loop to calculate confidence intervals on model performance. These intervals reflect uncertainty due to both data variability and the hyperparameter selection process.