Skip to Main Content
BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of BLAST are not efficiently supported by general-purpose processors because of the special computational requirements of the kernels. The kernels involve large amounts of computations which contain a high degree of potential parallelism that general-purpose processors can only exploit to a very limited extent. The kernels handle operands that are small (one byte) and not efficiently manipulated by general-purpose processors. The kernels entail only simple operations whereas current general-purpose processors expend significant proportion of their chip area to support complex operations, such as floating-point operations. The kernels perform a large amount of memory accesses, which translates into severe penalties. In this paper, we propose an efficient PIM (processor-in-memory) architecture to effectively execute the kernels of BLAST. We propose not only to reduce the memory latencies and increase the memory bandwidth but also to execute the operations inside the memory where the data are located. We also propose to execute the operations in parallel by dividing the memory into small segments and by having each of these segments execute operations concurrently. Our simulation results show that our computing paradigm provides a 242× performance improvement for the executions of the kernels and a 12× performance improvement for the overall execution of BLAST.