Skip to Main Content
Feature selection is an important data preprocessing step in pattern recognition. Recently, a wrapper-type semi-supervised feature selection method, known as FW-SemiFS, was proposed to overcome the small labeled sample problem of supervised feature selection. FW-SemiFS does not consider the confidence of predicted unlabeled data, but rather evaluates the relevance of features according to their frequency. Such frequencies are obtained via iterative supervised sequential forward feature selection (SFFS). However, the large amount of computational time associated with iterative SFFS is detrimental to FW-SemiFS. Furthermore, this relevance evaluation method eliminates the primary advantage of wrapper-type feature selection: the ability to evaluate the discriminative power of a combination of features. In this paper, we propose a new wrapper-type semi-supervised feature selection framework that can select a more relevant feature subset using confident unlabeled data. The proposed framework, called ensemble-based semi-supervised feature selection (EN-SemiFS), employs an ensemble classifier that supports the estimation of the confidence of unlabeled data. We analyzed the relationship between wrapper-type feature selection and the confidence of unlabeled data and explored how this relationship can make the semisupervised feature selection framework faster and more accurate. The experimental results revealed that the proposed method can select a more relevant feature subset when compared to existing methods.