Accelerating Large-Scale Molecular Similarity Search through Exploiting High Performance Computing | IEEE Conference Publication | IEEE Xplore

Accelerating Large-Scale Molecular Similarity Search through Exploiting High Performance Computing


Abstract:

Molecular similarity search is a simple but powerful chemoinformatics tool to rapidly find molecules that are structurally similar to a known reference compound from a la...Show More

Abstract:

Molecular similarity search is a simple but powerful chemoinformatics tool to rapidly find molecules that are structurally similar to a known reference compound from a large molecular database. A variety of indexing structures had been developed to improve the performance of similarity search over the large compound database. However, those algorithms often require a large computational cost to build indices and process queries, especially for a large-scale molecular dataset. We study the problem of accelerating similarity search using high performance computing (HPC) and design general algorithms to speed up existing indexing algorithms. We first propose a parallel algorithm based on data chunking, working for all indexing algorithms for similarity search. We theoretically analyze its computation cost and relationships between the speedup and number of data chunks. We further propose a parallel query algorithm for all graph-based indexing algorithms to accelerate their query processing in HPC. Both of our algorithms consistently offer a greater speedup than the baseline algorithm(s) when evaluated with different datasets and parameter settings.
Date of Conference: 18-21 November 2019
Date Added to IEEE Xplore: 06 February 2020
ISBN Information:
Conference Location: San Diego, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.