Abstract:
With the growth of digital data and rising security concerns, techniques for privacy-preserving computation have become increasingly essential. Big integer multiplication...Show MoreMetadata
Abstract:
With the growth of digital data and rising security concerns, techniques for privacy-preserving computation have become increasingly essential. Big integer multiplication, pivotal for these applications, is compute-intensive but poses challenges for GPU acceleration due to its complexity and the need for application-specific tailored implementations. This paper presents IMCompiler, a compiler-like framework that automatically gen-erates optimized GPU kernels for integer multiplications used in cryptosystems. It features a frontend-IR-backend structure, where the Intermediate Representation (IR) employs a segmented integer multiplication algorithm to decouple architecture-specific optimizations from high-level parameters. The frontend can then easily translate integer multiplication with various high-level parameters into the IR, while the backend focuses on fine-tuning a single GPU kernel for each device, enabling automatic code generation. Moreover, we introduce a computation diagram to facilitate the analysis of parallelization strategies, inspiring many optimizations, including two-dimensional parallelization, tailored caching strategy, index transposing, and lazy carrying. Experiments show that IMCompiler achieves a 4.47× speedup compared to the widely used baseline and 1.42 × over Nvidia's official library. The speedup will be even higher for larger integers and higher-capacity GPUs.
Date of Conference: 02-06 November 2024
Date Added to IEEE Xplore: 03 December 2024
ISBN Information: