We consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in high dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature importance evaluation method (SSFI for short), that combines ideas from co-training and random forests with a new permutation-based out-of-bag feature importance measure. We provide empirical results on several benchmark datasets indicating that SSFI can lead to significant improvement over state-of-the-art semi-supervised and supervised algorithms.
Published in:
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Date of Conference: 11-14 Dec. 2011