Skip to Main Content
Due to the importance of gene expression data in cancer diagnosis and treatment, microarray gene expression data have attracted more and more attentions from cancer researchers in recent years. However, in real-world computational analysis, such data common meet with the curse of dimensionality due to the tens of thousands of measures of gene expression level versus the small number of samples. therefore, developing effective clustering method is a challenging problem for high dimensional dataset. Here, we use two step feature filtering and dimensional reduction methods to reduce the dimension of gene expression data. At first, we extract a subset of genes based on ReliefF and Fast Correlation-Based Filter (FCBF). Then, the clustering approach of k-means (KM), KM with principal component analysis (PCA), KM with random projection (RP), respectively is implemented on the reduced gene dataset and generates the resulting data of clusters of cancer samples. Experimental results on the small round blue-cell tumor (SRBCT) data set demonstrate that two step feature filtering can significantly improve the performance of KM clustering algorithm and contribute to the application of PCA and RP in high dimensional space and that the effectiveness and efficiency of our proposed scheme in addressing high dimensional gene expression data.