Skip to Main Content
Many previous research papers have demonstrated that microarray gene expression data are useful for disease classification and medical diagnosis. Cancer microarray data normally have a particular characteristic where features (genes) greatly exceed the instance (tissue sample) numbers. Selecting appropriate numbers and relevant features to differentiate different types of cancer remains a challenge in bioinformatics. In order to select useful gene sets from microarray data to promote classification performance effectively, feature selection approaches were included in many previous literature reports. In this paper, a hybrid approach which combines correlation-based feature selection and binary particle swarm optimization was used to select few subsets, combined with the K-nearest neighbor method as a classifier to evaluate the classification performance. The proposed approach is applied on six microarray gene expression data sets that relate to human cancer. The experimental results show that the proposed approach selects a smaller number of feature subsets and obtains better classification accuracy.