BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of BLAST are not efficiently supported by general-purpose processors because of the special computational requirements of the kernels. In this paper, we propose an efficient PIM (Processor-In-Memory) architecture to effectively execute the kernels of BLAST. We propose not only to reduce the memory latencies and increase the memory bandwidth but also to execute the operations inside the memory where the data are located. We also propose to execute the operations in parallel by dividing the memory into small segments and by having each of these segments executes operations concurrently. Our simulation results show that our computing paradigm provides a 242x performance improvement for the executions of the kernels and a 12x performance improvement for the overall execution of BLAST.