Skip to Main Content
Feature selection is commonly used in bioinformatics applications, such as gene selection from DNA micro array data. Recently, wrapper methods have been proposed as an improvement over traditionally used filter based feature selection methods. In wrapper methods, the goodness of a feature set is often measured using the cross-validation performance of a machine learning method trained with the features. This can lead to over fitting, meaning that the cross-validation performance on the final selected feature set may be high even in cases when the selected features in fact are not informative. Evaluating the statistical significance of gained results is therefore of major concern. Non-parametric permutation tests have been previously used as a univariate filter for selecting individual features. In contrast, we propose using such tests to measure the statistical significance of the whole selection process, which is carried out by a wrapper method. We achieve computational efficiency by using a regularized least-squares based wrapper method, which combines a state-of-the-art classifier with matrix calculus based computational shortcuts for greedy forward feature selection. Permutation tests prove to be a practical tool for estimating the significance of gained results, as shown in simulations and experiments on two DNA micro array data sets.