Parallelizing Big De Bruijn Graph Construction on Heterogeneous Processors | IEEE Conference Publication | IEEE Xplore

Parallelizing Big De Bruijn Graph Construction on Heterogeneous Processors


Abstract:

De Bruijn graph construction is the first step in de novo assemblers to connect input reads into a complete sequence without a reference genome. This step is both time an...Show More

Abstract:

De Bruijn graph construction is the first step in de novo assemblers to connect input reads into a complete sequence without a reference genome. This step is both time and memory space consuming. To address this problem, we develop ParaHash, a system that partitions the input data in a compact format, parallelizes the computation on both the CPUs and the GPUs in a single computer, and performs hash-based De Bruijn graph construction. This way, ParaHash utilizes all available processors to assemble big genomes that cannot fit into memory. Furthermore, we analyze the characteristics of genome data to set the hash table size, design concurrent hashing algorithms to handle the inherent multiplicity, and pipeline the data transfer and the computation for further efficiency. Our experiments on real-world genome datasets show that the workload was balanced across heterogeneous processors, and that ParaHash was able to construct billion-node graphs on a single machine with an overall performance up to 20 times faster than the state-of-the-art shared-memory assemblers.
Date of Conference: 05-08 June 2017
Date Added to IEEE Xplore: 17 July 2017
ISBN Information:
Print ISSN: 1063-6927
Conference Location: Atlanta, GA, USA
Related Articles are not available for this document.

Contact IEEE to Subscribe

References

References is not available for this document.