Skip to Main Content
Clustering the time series gene expression data is an important task in bioinformatics research and biomedical applications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clustering time series gene expression data by proposing algorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, the mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the outliers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the mining. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.
Date of Conference: 10-12 March 2003