Skip to Main Content
Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e. underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a userfriendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.