Skip to Main Content
Cluster methods have been successfully applied in gene expression data analysis to address tumor classification. By grouping tissue samples into homogeneous subsets, more systematic characterization can be developed and new subtypes of tumors be discovered. Central to cluster analysis is the notion of similarity between the individual samples. In this paper, we propose latent structure models as a framework where dependence among genes and thus relationship between samples can be modelled in a better way in terms of topology and flexibility. A latent structure model is a Bayesian network where the network structure contains at least a rooted tree including all variables, only variables at the leaf nodes are observed, and the structure after deleting all the observed variables is a rooted tree. The main gain in using latent structure models is that they provide a principled and systematic method to handle the dependence among genes. There are other benefits offered by latent structure models. They do not require any prior knowledge on the determination of tumor classes and choice of similarity metric, which are two important issues associated with the traditional clustering techniques. They are also computationally attractive due to the simplicity of their structures. We develop a search-based algorithm for learning latent structures model from microarrays. The effectiveness of the algorithm and the proposed models is demonstrated on publicly available microarray data.