Skip to Main Content
An important application of microarrays is to identify the relevant genes, among thousands of genes, for phenotypic classification. The performance of a gene selection algorithm is often assessed in terms of both predictive capacity and computational efficiency, but predictive capacity of selected features receives more attention than does computational efficiency. However, in gene selection problems, the computational efficiency is equally important because of very high dimensionality of gene expression data. We propose an SVM-IRFS algorithm which combines Support Vector Machine (SVM) based criterion, generalized parwpar2 measure, with a new search procedure, named as Iterative Reduced Forward Selection (IRFS), to address the gene selection problem. In the IRFS, an adaptive threshold is used to screen the irrelevant feature subsets, thus unnecessary computations can be avoided. The advantage of our proposed SVM-IRFS algorithm is twofold. First, the selection procedure of SVM-IRFS algorithm is computationally very efficient. It can identify tens from thousands of genes in several seconds. Second, benefiting from the good classification performance of support vector machines, SVM-IRFS produces the feature subset with high predictive capacity.