Skip to Main Content
A typical microarray gene expression dataset is usually both extremely sparse and imbalanced. To select multiple highly informative gene subsets for cancer classification and diagnosis, a new fuzzy granular support vector machine-recursive feature elimination algorithm (FGSVM-RFE) is designed in this paper. As a hybrid algorithm of statistical learning, fuzzy clustering, and granular computing, the FGSVM-RFE separately eliminates irrelevant, redundant, or noisy genes in different granules at different stages and selects highly informative genes with potentially different biological functions in balance. Empirical studies on three public datasets demonstrate that the FGSVM-RFE outperforms state-of-the-art approaches. Moreover, the FGSVM-RFE can extract multiple gene subsets on each of which a classifier can be modeled with 100% accuracy. Specifically, the independent testing accuracy for the prostate cancer dataset is significantly improved. The previous best result is 86% with 16 genes and our best result is 100% with only eight genes. The identified genes are annotated by Onto-Express to be biologically meaningful.