Skip to Main Content
As the capture and analysis of single-time-point microarray expression data becomes routine, investigators are turning to time-series expression data to investigate complex gene regulation schemes and metabolic pathways. These investigations are facilitated by algorithms that can extract and cluster related behaviors from the full population of time-series behaviors observed. Although traditional clustering techniques have shown to be effective for certain types of expression analysis, they do not take the biological nature of the process into account, and therefore are clearly not optimized for this purpose. Moreover, the current approaches provide internal comparisons for the experiments utilized for clustering, but cross-comparisons between clustered results are qualitative and subjective. We present a combination of current and novel methods for the analysis of time series gene expression data. We focus on an actual study we have performed for Haemophilus influenzae which is a major cause of otitis media in children. We first perform a discretization of the gene expression data that takes both positive and negative correlations into consideration and then develop a clustering algorithm optimized for such data that allows elucidation and searching of time-series patterns. The resulting approach allows time-series data to be usefully compared across multiple experiments. We demonstrate the success of our algorithm by showing some of the genes that it finds to be co-regulated are not detected by current methods. As a result we are able to identify several signal pathways that initiate competence development, and to characterize the transcriptomes of wild-type and an adenylate cyclase mutant (cya) strains under both nutrient-limiting and nutrient-complete growth conditions.
Date of Conference: 19-21 May 2004