HErCoOl: High-Throughput Error Correction by Oligomers | IEEE Conference Publication | IEEE Xplore

HErCoOl: High-Throughput Error Correction by Oligomers


Abstract:

Next-generation sequencing (NGS) technologies are marking the foundations for a new paradigm in genomics and transcriptomics. Nowadays is possible to sequence any microbi...Show More

Abstract:

Next-generation sequencing (NGS) technologies are marking the foundations for a new paradigm in genomics and transcriptomics. Nowadays is possible to sequence any microbial organism or meta-genomic sample within hours, and to obtain a whole human genome in less than a month. The sequencing prices are decreasing dramatically, opening to actual personalised medicine. NGS technologies however are error-prone, and correcting errors is a challenge due to multiple factors, including the data sizes (gigabyte scale) and the machine-specific, non-at-random, characteristics of errors and error distributions. Several approaches have been proposed, but yet the problem is a challenge, especially when analysing mixtures of (closely related) species, e.g., highly variable viruses infecting in a host as a swarm, like hepatitis C or human immunodeficiency virus. This work presents a novel error correction algorithm based on k-mer strings with their associated overlap graph, along with an open-source, multi-threaded, implementation. The algorithm, named Her Cool (High-throughput Error Correction by Oligomers), needs minimal tuning, only an overall error rate and -optionally- information about the genome sizes. Her Cool was compared against other state-of-the art methods, using empirical NGS data obtained with Roche 454 technology, focusing the benchmarks on mixtures of related species. Results show that Her Cool improves significantly over the current methods, and the parallelisation scales well with the size of input NGS genome producing long sequence reads, such as Roche 454 or Ion Torrent. Her Cool provides a fast and efficient error correction of NGS data, especially for mixed samples. Its platform-independent, open-source, multi-threaded implementation assures flexibility for being employed and integrated in any NGS data analysis software.
Date of Conference: 27-29 May 2014
Date Added to IEEE Xplore: 25 August 2014
Electronic ISBN:978-1-4799-4435-4

ISSN Information:

Conference Location: New York, NY, USA

Contact IEEE to Subscribe

References

References is not available for this document.