Skip to Main Content
Parallel programming techniques have become one of the great challenges in the transition from single-core to multicore architectures. In this paper, we investigate the parallelization of the Montgomery multiplication, a very common and time-consuming primitive in public-key cryptography. A scalable parallel programming scheme, called pSHS, is presented to map the Montgomery multiplication to a general multicore architecture. The pSHS scheme offers a considerable speedup. Based on 2-, 4-, and 8-core systems, the speedup of a parallelized 2048-bit Montgomery multiplication is 1.98, 3.74, and 6.53, respectively. pSHS delivers stable performance, high portability, high throughput and low latency over different multicore systems. These make pSHS a good candidate for public-key software implementations, including RSA, DSA, and ECC, based on general multicore platforms. We present a detailed analysis of pSHS, and verify it on dual-core, quad-core and eight-core prototypes.