By Topic

An Iterative Data Mining Approach for Mining Overlapping Coexpression Patterns in Noisy Gene Expression Data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ma, P.C.H. ; Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China ; Chan, K.C.C.

Clustering is concerned with the discovery of groupings of records in a database. Many clustering problems are defined as partitioning problems in the sense that the similar records are grouped into nonoverlapping partitions. However, the clustering of gene expression data to discover coexpressed genes may not always be meaningful if this problem is reduced into a partitioning problem. Due to the complexity of the underlying biological processes, a protein can interact with one or more other proteins belonging to different functional classes in order to perform a particular biological role. For this reason, when responding to different external stimulants, a gene that produces a particular protein can coexpress with more than one group of other genes. The gene can therefore belong to more than one group of coexpressed genes. This poses a challenge to many clustering algorithms as they are not originally developed to discover overlapping clusters in noisy gene expression data. In this paper, we propose an iterative data mining approach that consists of two phases as follows. In phase 1, a clustering algorithm is used to discover the initial, nonoverlapping partitioning of gene expression profiles in gene expression data. Then, the partition memberships of genes are redetermined iteratively in phase 2 by a pattern discovery technique so as to determine that if a gene should remain in the same partition, be moved to another partition, or be also grouped together with other genes in another partitions. The proposed approach has been tested with both artificial and real datasets. Experimental results show that it can improve the performances of existing clustering algorithms and is able to effectively discover overlapping clusters in noisy gene expression data.

Published in:

NanoBioscience, IEEE Transactions on  (Volume:8 ,  Issue: 3 )