By Topic

Normalized EM algorithm for tumor clustering using gene expression data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Nguyen Minh Phuong ; Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Kensington, NSW ; Nguyen Xuan Vinh

Most of the proposed clustering approaches are heuristic in nature. As a result, it is difficult to interpret the obtained clustering outcomes from a statistical standpoint. Mixture model-based clustering has received much attention from the gene expression community due to its sound statistical background and its flexibility in data modeling. However, current clustering algorithms following the model-based framework suffer from two serious drawbacks. First, the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. And second, they are not capable of working directly with very high dimensional data sets whose dimension might be up to thousands. We propose a novel normalized Expectation-Maximization (EM) algorithm to tackle the two challenges. The normalized EM is stable even with random initializations for its EM iterative procedure. Its stability is demonstrated through the performance comparison with other related clustering algorithms such as the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering) and spherical k-means. Furthermore, the normalized EM is the first mixture model-based clustering algorithm that is shown to be stable when working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. Besides, an interesting property of the convergence speed of the normalized EM with respect to the squared radius of the hypersphere in its corresponding statistical model is uncovered.

Published in:

BioInformatics and BioEngineering, 2008. BIBE 2008. 8th IEEE International Conference on

Date of Conference:

8-10 Oct. 2008