Skip to Main Content
Using DNA microarray technology, biologists get a large number of gene expression time series data. Clustering is a significant approach to extracting biological information from these data. This paper proposes a novel clustering method, HMM-based hierarchical clustering (HMM-HC), to analyze gene expression time series data. We convert time-point data to discrete symbols on the base of the fact that the logarithm of the data approximately obeys normal distribution, and build hidden Markov models with these symbols for gene sequences. In a gene expression time series, the time point data is correlated with others. The use of HMMs can help to take advantage of this special correlation. We tested the method with two common datasets. The results show that it can produce high-quality clusters and find out the appropriate cluster number.