High-throughput distributed data analysis based on clustered computing is gaining increasing importance in the field of computational biology. This paper describes a parallel programming approach and its software implementation using Message Passing Interface (MPI) to parallelize a computationally intensive algorithm for identifying cellular contexts. We report successful implementation on a 1,024 processor Beowulf cluster to analyze microarray data consisting of hundreds of thousands of measurements from different datasets. Detailed performance evaluation shows that data analysis that could have taken months on a stand-alone computer was accomplished in less than a day.
Published in:
Genomic Signal Processing and Statistics, 2008. GENSiPS 2008. IEEE International Workshop on
Date of Conference: 8-10 June 2008