Skip to Main Content
Next Generation Sequencing (NGS) platforms typically produce short reads of size 50-150 base pairs (bp). The number of such short reads can be up to 6 billion per run. To align these short reads to a large genome is a computationally challenging problem. In this paper, we address this problem by considering the design and optimization of parallel sequence alignment on GPU based hybrid architectures. Even though the sequence alignment algorithm is inherently data-parallel, issues such as (a) space-time trade-offs in the Indexing schema, (b) need for fast candidate location search (CAL) on GPU, (c) maintaining low divergence along with low space for the dynamic programming based local alignment, make this a very challenging problem. We present the design of our novel parallel algorithm Graphics processor Accelerated BFAST (GrABFAST) for large scale read alignment that overcomes these challenges and demonstrates superior performance compared to Intel multi-core architectures. Using 5 large genomes including those of Humans, Maize, Horse, Dog and Bacteria, we demonstrate a speedup of around 6x using Fermi Tesla C2070 GPUs vs the BFAST algorithm on 16 core Intel Xeon 5570 architecture.