Skip to Main Content
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. In this paper, k-means clustering algorithm has been extensively studied for gene expression analysis. Since our purpose is to demonstrate the effectiveness of the k-means algorithm for a wide variety of data sets, we have chosen two pattern recognition data and thirteen microarray data sets with both overlapping and non-overlapping cluster boundaries, where the number of features/genes ranges from 4 to 7129 and number of sample ranges from 32 to 683. The number of clusters ranges from two to eleven. We use the clustering error rate (or, clustering accuracy) as evaluation metrics to measure the performance of k-means algorithm.