Loading [a11y]/accessibility-menu.js
Efficient and Scalable Alignment-Free Distributed Genotyping of SNPs and Short Indels | IEEE Journals & Magazine | IEEE Xplore

Efficient and Scalable Alignment-Free Distributed Genotyping of SNPs and Short Indels


Abstract:

The growing volume of sequencing data and the ever-larger size of variants databases challenge genotyping procedures to handle massive genomics datasets efficiently. Rece...Show More

Abstract:

The growing volume of sequencing data and the ever-larger size of variants databases challenge genotyping procedures to handle massive genomics datasets efficiently. Recent alignment-free solutions leverage exclusively on the k-mers counts to speed up the analysis, but have to trade off the time gain against the memory requirements, to make the elaborations possible on a single workstation. In this paper, we present SparkGeno+, a novel alignment-free (AF) distributed pipeline for the fast and accurate genotyping of Single Nucleotide Polymorphisms (SNPs) and indels on a large scale. Starting from a previous pipeline, we identified and evaluated the performance bottlenecks that arise when performing genotyping using a standard AF approach, to develop and implement several innovations to better exploit the resources of a distributed system. The effectiveness of our proposal has been validated through an experimental analysis on widely studied datasets. The results show that the accuracy of SparkGeno+ matches the one of state-of-the-art alignment-free tools like Vargeno and MALVA. Moreover, the time performance of SparkGeno+ scales well with the number of computing units, thus allowing execution times that are in order of growth smaller than those of classical genotyping tools. This indicates SparkGeno+ to be a promising solution for large-scale genotyping applications.
Page(s): 1 - 12
Date of Publication: 09 January 2025
Electronic ISSN: 2998-4165

Contact IEEE to Subscribe