Skip to Main Content
The field of bioinformatics and computational biology is experiencing a data revolution - experimental techniques to procure data have increased in throughput improved in accuracy and reduced in costs. This has spurred an array of high profile sequencing and data generation projects. While the data repositories represent untapped reservoirs of rich information critical for scientific breakthroughs the analytical software tools that are needed to analyze large volumes of such sequence data have significantly lagged behind in their capacity to scale. In this paper we address homology detection which is a fundamental problem in large-scale sequence analysis with numerous applications. We present a scalable framework to conduct large-scale optimal homology detection on massively parallel super-computing platforms. Our approach employs distributed memory work stealing to effectively parallelize optimal pairwise alignment computation tasks. Results on 120,000 cores of the Hopper Cray XE6 supercomputer demonstrate strong scaling and up to 2.42 × 107 optimal pairwise sequence alignments computed per second (PSAPS) the highest reported in the literature.