By Topic

Accelerating the kernels of BLAST with an efficient PIM (processor-in-memory) architecture

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Jung-Yup Kang ; Dept. of Electr. Eng., Southern California Univ., Los Angeles, CA, USA ; Gupta, S. ; Gaudiot, J.-L.

BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of BLAST are not efficiently supported by general-purpose processors because of the special computational requirements of the kernels. The kernels involve large amounts of computations which contain a high degree of potential parallelism that general-purpose processors can only exploit to a very limited extent. The kernels handle operands that are small (one byte) and not efficiently manipulated by general-purpose processors. The kernels entail only simple operations whereas current general-purpose processors expend significant proportion of their chip area to support complex operations, such as floating-point operations. The kernels perform a large amount of memory accesses, which translates into severe penalties. In this paper, we propose an efficient PIM (processor-in-memory) architecture to effectively execute the kernels of BLAST. We propose not only to reduce the memory latencies and increase the memory bandwidth but also to execute the operations inside the memory where the data are located. We also propose to execute the operations in parallel by dividing the memory into small segments and by having each of these segments execute operations concurrently. Our simulation results show that our computing paradigm provides a 242× performance improvement for the executions of the kernels and a 12× performance improvement for the overall execution of BLAST.

Published in:

Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE

Date of Conference:

16-19 Aug. 2004