Performance Evaluation of Instruction Set Extensions for Long Integer Modular Arithmetic on a SPARC V8 Processor
Grossschadl, J.
Tillich, S.
Szekely, A.
Inst. for Appl. Inf. Process. & Commun., Graz Univ. of Technol., Graz, Austria;
Abstract
Many important algorithms for public-key cryptography rely on computation-intensive arithmetic operations like modular exponentiation on very long integers, typically in the range of 512 and 2048 bits. Modular exponentiation is generally realized through a sequence of modular multiplications and spends the majority of execution time in simple inner loops. Speeding up these performance-critical inner loop operations with custom instructions has, therefore, a significant impact on the total execution time of public-key cryptosystems. In this paper we analyze the performance of instruction set extensions for long integer arithmetic on a SPARC V8 processor. We discuss various implementation options and optimization opportunities for both modular multiplication and exponentiation. In particular, we introduce a partial loop unrolling (PLU) technique for modular multiplication which allows to achieve large performance gains at the cost of a moderate increase in code size, while maintaining the full flexibility of a "rolled-loop" implementation. In addition, we study window methods for modular exponentiation and analyze their impact on performance and memory requirements. Our experimental results, obtained with an FPGA prototype of the LEON-2 SPARC V8 core, show that a full 1024-bit modular exponentiation can be performed in about 12.5 ldr 106 clock cycles, which is a reasonable value for embedded devices like smart cards or sensor nodes.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.