This paper concerns the discovery of order preserving clusters (OP-clusters) in microarray data, in each of which a subset of genes induce a similar linear ordering along a subset of conditions. By converting each gene expression vector into an ordered label sequence, we transfer the problem into finding frequent orders appearing in the sequence set. We present two heuristic algorithms growing prefix and suffix (GPS) and growing frequent position (GFP), to solve this problem. Their performance is evaluated empirically using synthetic and real microarray data. The result shows our approaches are effective and efficient and outperform existing methods in many aspects. The two proposed algorithms, GPS and GFP, both have good scale-up properties with the dimension of the dataset and the size of the clusters. They have comparable performance, albeit GPS gets higher precision, whereas GFP has lower computation cost.
Published in:
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Date of Conference: 14-17 Oct. 2007