Noticing that unsupervised clustering may produce clusters that are irrelevant to the research hypotheses and interests, we generalize traditional unsupervised clustering into semi-supervised clustering based on our previously proposed message passing clustering (MPC). In the semi-supervised MPC, prior knowledge such as instance-level and attribute-level constraints are used to guide the clustering process towards better and interpretable partitions. We applied the unsupervised MPC ( background) to phylogenetic analysis of Mycobacterium and the semi-supervised MPC to colon cancer microarray data analysis. The results show that MPC is superior to the widely accepted neighbor-joining and hierarchical clustering methods, and the semi-supervised MPC is even more powerful in biological data analysis such as gene selection and cancer diagnosis using microarray.
Published in:
Bioinformatics and Bioengineering, 2005. BIBE 2005. Fifth IEEE Symposium on
Date of Conference: 19-21 Oct. 2005