Skip to Main Content
In this investigation, a cancer classification approach is presented using clustering based gene selection and artificial neural networks. To address the so called `curse of dimensionality' a T-statistic feature selection method, one of the univariate filter techniques, is used to select the most informative genes. However, instead of selecting a small group of relevant genes at once from the whole range of data, the genes are clustered into a number of groups and then the intended gene subset is formed incorporating top ranked members from each group. This process is adopted not only to ensure the selection of the most relevant and informative genes but also to bring information diversity in the selected genes. Three different clustering algorithms, namely, K-means clustering, Fuzzy C-means clustering and self-organizing map (SOM) are used. Samples classification is then carried out using a multi-layered perceptron (MLP) neural network trained with the Levenberg-Marquardt algorithm. The performance of the approach is evaluated in terms of accuracy, sensitivity and specificity and found to be comparable with that of the non-clustering based approach.