Skip to Main Content
In this paper we explore the possibility of automatic model selection in the supervised learning framework with the use of prediction intervals. First we compare two families of non-parametric approaches of constructing prediction intervals for arbitrary regression models. The first family of approaches is based on the idea of explaining the total prediction error as a sum of the model's error and the error caused by noise inherent to the data - the two are estimated independently. The second family assumes local similarity of the data and these approaches estimate the prediction intervals with use of the sample's nearest neighbors. The comparison shows that the first family strives to produce valid prediction intervals whereas the second family strives for optimality. We propose a statistic for model selection where we compare the discrepancy between valid and optimal prediction intervals. Experiments performed on a set of artificial datasets strongly support the hypothesis that for the correct model, this discrepancy is minimal among competing models.