By Topic

Integration of cluster ensemble and text summarization for gene expression analysis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Xiaohua Hu ; Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA

Generating high quality gene clusters and identifying the underlying biological mechanism of the gene cluster are the important goals of clustering gene expression analysis. To get high quality cluster results, most of the current approaches rely on choosing the best cluster algorithm whose design biases and assumptions meet the underlying distribution of the data set. There are two issues for this approach: (1) usually the underlying data distribution of the gene expression data sets is unknown, and (2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. But cluster quality and cluster interpretation are closely related and must be addressed in a coherent and unified way. It is essential to have relatively high quality clusters first, in order to get a correct, informative biological explanation of the gene cluster, otherwise, the biological explanation will be incorrect or misleading, no matter how good or robust the text summarization technique is. Based on this consideration, we design and develop a unified system GE-Miner (gene expression miner) to address these challenging issues in a principled and general manner by integrating cluster ensemble and text summarization and provide an environment for comprehensive gene expression data analysis. Experimental results demonstrate that our system can obtain high quality clusters and provide concise and informative textual summary for the gene clusters.

Published in:

Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on

Date of Conference:

19-21 May 2004