Conferences >2018 International Conference...

HMSPKmerCounter: Hadoop based Parallel, Scalable, Distributed Kmer Counter for Large Datasets

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Counting the frequency of every distinct substring of length k in sequence reads is an initial step in many bioinformatics applications such as genome assembly, correctio...Show More

Metadata

Abstract:

Counting the frequency of every distinct substring of length k in sequence reads is an initial step in many bioinformatics applications such as genome assembly, correction of errors in sequencing reads, fast multiple sequence alignment, and detection of repeats. This problem is called as a k-mer counter problem. Although k-mer counting problem looks simple, when size of the input sequence reads dataset is massive and the number of k-mers increases, single node based k-mer counter tools would exhaust the memory and hard disk capacity of a single computer. Hadoop is identified as one of the scalable, parallel big data frameworks for data-intensive applications and to process large data sets in a cluster of computers with low-cost commodity hardware. In this paper, a Hadoop based k-mer counter with Minimum Substring Partitioning (HMSPKmerCounter) method is developed and compared with k-mer counting program of BioPig which is the first Hadoop based k-mer counter and KMC3, which is the recent single node multithreaded k-mer counter tool. Our results show that Hadoop based K-mer counter with Minimum Substring Partitioning outperforms k-mer counter program of Biopig for the k values = 28, 40, 55 and 65. Also, results show that our implementation outperforms KMC3 as k value increases.

Published in: 2018 International Conference on Bioinformatics and Systems Biology (BSB)

Date of Conference: 26-28 October 2018

Date Added to IEEE Xplore: 25 July 2019

ISBN Information:

DOI: 10.1109/BSB.2018.8770594

Conference Location: Allahabad, India

Contents

References is not available for this document.

HMSPKmerCounter: Hadoop based Parallel, Scalable, Distributed Kmer Counter for Large Datasets

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

HMSPKmerCounter: Hadoop based Parallel, Scalable, Distributed Kmer Counter for Large Datasets

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?