Loading [MathJax]/extensions/MathMenu.js
Asynchronous Distributed Actor-Based Approach to Jaccard Similarity for Genome Comparisons | Prometeus GmbH Conference Publication | IEEE Xplore

Asynchronous Distributed Actor-Based Approach to Jaccard Similarity for Genome Comparisons


Abstract:

The computation of genome similarity is important in computational biology applications, and is assessed by calculating the Jaccard similarity of DNA sequencing sets. How...Show More

Abstract:

The computation of genome similarity is important in computational biology applications, and is assessed by calculating the Jaccard similarity of DNA sequencing sets. However, it's challenging to find solutions that can compute Jaccard similarity with the efficiency and scalability needed to fully utilize capabilities of modern HPC hardware. We introduce a novel approach for computing Jaccard similarity for genome comparisons, founded on an actor-based programming model. Our algorithm takes advantage of fine-grained asynchronous computations, distributed/shared memory, and the Fine-grained Asynchronous Bulk-Synchronous Parallelism execution model. Our performance results on the NERSC Perlmutter supercomputer demonstrate that this approach scales to 16, 384 cores, showing an average of 4.94x and 5.5x improvement in execution time at the largest scale and relevant hardware performance monitors at medium scale compared to a state-of-the-art baseline. Our approach is also able to process much larger scale genomic datasets than this baseline. We make our source code publicly available11https://github.com/youssefelmougy/jaccard-selector/
Date of Conference: 12-16 May 2024
Date Added to IEEE Xplore: 10 May 2024
Electronic ISBN:978-3-9826336-0-2
Conference Location: Hamburg, Germany

References

References is not available for this document.