Fast and Informative Model Selection Using Learning Curve Cross-Validation | IEEE Journals & Magazine | IEEE Xplore

Fast and Informative Model Selection Using Learning Curve Cross-Validation


Abstract:

Common cross-validation (CV) methods like k-fold cross-validation or Monte Carlo cross-validation estimate the predictive performance of a learner by repeatedly training ...Show More

Abstract:

Common cross-validation (CV) methods like k-fold cross-validation or Monte Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing it on the remaining data. These techniques have two major drawbacks. First, they can be unnecessarily slow on large datasets. Second, beyond an estimation of the final performance, they give almost no insights into the learning process of the validated algorithm. In this article, we present a new approach for validation based on learning curves (LCCV). Instead of creating train-test splits with a large portion of training data, LCCV iteratively increases the number of instances used for training. In the context of model selection, it discards models that are unlikely to become competitive. In a series of experiments on 75 datasets, we could show that in over 90% of the cases using LCCV leads to the same performance as using 5/10-fold CV while substantially reducing the runtime (median runtime reductions of over 50%); the performance using LCCV never deviated from CV by more than 2.5%. We also compare it to a racing-based method and successive halving, a multi-armed bandit method. Additionally, it provides important insights, which for example allows assessing the benefits of acquiring more data.
Page(s): 9669 - 9680
Date of Publication: 08 March 2023

ISSN Information:

PubMed ID: 37028368

References

References is not available for this document.