Skip to Main Content
In view of the characteristics of high-dimensional small sample, strong relevance, and high noise of the identification of tumor-specific genes on microarray, a novel partial least squares (PLS) based gene-selection method, which synthesizes genetic relatedness and is suitable for multicategory classification, is presented. Using the explanation difference of independent variables on dependent variable (class), we define three indicators for global gene selection, which takes into accounts the combined effects of all the genes and the correlation among the genes. Integrated with the linear kernel support vector classifier (SVC), the proposed method is tested by MIT acute myeloid leukemia/acute lymphoblastic leukemia (AML/ALL) and small round blue cell tumors (SRBCT) data sets. A subset of specific genes with small numbers and high identification are obtained. The results indicate that our proposed PLS-based method for tumor-specific genes selection is highly efficient. Compared to the literature, the selected specific genes from both two-category dataset AML/ALL and multicategory dataset SRBCT are credible. Further investigation shows that the proposed gene-selection method is robust. Overall, the proposed method can effectively solve feature-selection problem on high-dimensional small sample. At the same time, it has good performance for multicategory classification as well.
Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on (Volume:41 , Issue: 6 )
Date of Publication: Nov. 2011