RCoal: Mitigating GPU Timing Attack via Subwarp-Based Randomized Coalescing Techniques | IEEE Conference Publication | IEEE Xplore

RCoal: Mitigating GPU Timing Attack via Subwarp-Based Randomized Coalescing Techniques


Abstract:

Graphics processing units (GPUs) are becoming default accelerators in many domains such as high-performance computing (HPC), deep learning, and virtual/augmented reality....Show More

Abstract:

Graphics processing units (GPUs) are becoming default accelerators in many domains such as high-performance computing (HPC), deep learning, and virtual/augmented reality. Recently, GPUs have also shown significant speedups for a variety of security-sensitive applications such as encryptions. These speedups have largely benefited from the high memory bandwidth and compute throughput of GPUs. One of the key features to optimize the memory bandwidth consumption in GPUs is intra-warp memory access coalescing, which merges memory requests originating from different threads of a single warp into as few cache lines as possible. However, this coalescing feature is also shown to make the GPUs prone to the correlation timing attacks as it exposes the relationship between the execution time and the number of coalesced accesses. Consequently, an attacker is able to correctly reveal an AES private key via repeatedly gathering encrypted data and execution time on a GPU. In this work, we propose a series of defense mechanisms to alleviate such timing attacks by carefully trading off performance for improved security. Specifically, we propose to randomize the coalescing logic such that the attacker finds it hard to guess the correct number of coalesced accesses generated. To this end, we propose to randomize: a) the granularity (called as subwarp) at which warp threads are grouped together for coalescing, and b) the threads selected by each subwarp for coalescing. Such randomization techniques result in three mechanisms: fixed-sized subwarp (FSS), random-sized subwarp (RSS), and random-threaded subwarp (RTS). We find that the combination of these security mechanisms offers 24- to 961-times improvement in the security against the correlation timing attacks with 5 to 28% performance degradation.
Date of Conference: 24-28 February 2018
Date Added to IEEE Xplore: 29 March 2018
ISBN Information:
Electronic ISSN: 2378-203X
Conference Location: Vienna, Austria
References is not available for this document.

I. Introduction

Graphics Processing Units (GPUs) are becoming an inevitable part of every computing system because of their ability to provide fast and energy-efficient computation. Given such ability, GPUs are also now being used to accelerate a variety of cryptographic algorithms. For example, the popular Advanced Encryption Standard (AES) algorithm [21] is known to achieve significant speedups on GPUs compared to CPUs [6], [9], [17], [23] as the AES algorithm exposes abundant thread-level parallelism to leverage high bandwidth and compute throughput of GPUs. With such increasing popularity of GPUs to accelerate security-sensitive applications, it is imperative to keep GPUs secure against a variety of side-channel attacks and other security vulnerabilities.

Select All
1.
A. Bakhoda, “Analyzing cuda workloads using a detailed gpu simulator,” in 2009 IEEE International Symposium on Performance Analysis of Systems and Software, April 2009, pp. 163–174.
2.
A. Bogdanov, “Differential cache-collision timing attacks on AES with applications to embedded cpus,” in Topics in Cryptology-CT-RSA 2010, ser. Lecture Notes in Computer Science, J. Pieprzyk, Ed., 2010, vol. 5985, pp. 235–251.
3.
J. Bonneau and I. Mironov, “Cache-collision timing attacks against AES,” in Cryptographic Hardware and Embedded Systems-CHES 2006, ser. Lecture Notes in Computer Science, L. Goubin and M. Matsui, Eds. Springer Berlin Heidelberg, 2006, vol. 4249, pp. 201–215.
4.
GPGPU-Sim v3. 2. 1. Address mapping. Available: { http://gpgpu-sim.org/manual/index.php5/GPGPU-Sim_3.x_Manual#Memory_Partition }.
5.
D. Gullasch, “Cache games-bringing access-based cache attacks on AES to practice,” in Proc. IEEE Symp. on Security and Privacy (S), 2011, pp. 490–505.
6.
O. Harrison and J. Waldron, “AES Encryption Implementation and Analysis on Commodity Graphics Processing Units,” in Proceedings of the 9th International Workshop on Cryptographic Hardware and Embedded Systems, ser. CHES 07, 2007.
7.
Hynix. Hynix GDDR5 SGRAM Part H5GQ1H24AFR Revision 1. 0. Available: { http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFRRev1.0.pdf }.
8.
G. Irazoqui, “Wait a minute! A fast, cross-VM attack on AES,” in Research in Attacks, Intrusions and Defenses, ser. Lecture Notes in Computer Science, A. Stavrou, Eds., 2014, vol. 8688, pp. 299–319.
9.
K. Iwai, “Aes encryption implementation on cuda gpu and its analysis,” in 2010 First International Conference on Networking and Computing, Nov 2010, pp. 209–214.
10.
Z. H. Jiang, “A complete key recovery timing attack on a GPU,” in HPCA, 2016.
11.
Z. H. Jiang, “A Novel Side-Channel Timing Attack on GPUs,” in Proceedings of the on Great Lakes Symposium on VLSI 2017. ACM, 2017, pp. 167–172.
12.
A. Jog, “OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance,” in ASPLOS, 2013.
13.
O. Kayiran, “Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs,” in PACT, 2013.
14.
D. Kirk and W. W. Hwu, Programming Massively Parallel Processors. Morgan Kaufmann, 2010.
15.
J. Kloosterman, “Warppool: Sharing requests with inter-warp coalescing for throughput processors,” in MICRO, 2015.
16.
J. Leng, “GPUWattch: Enabling Energy Optimizations in GPGPUs,” in ISCA, 2013.
17.
Q. Li, “Implementation and analysis of AES encryption on GPU,” in High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on. IEEE, 2012.
18.
X. Li, “Sapper: A language for hardware-level security policy enforcement,” in ASPLOS, 2014.
19.
F. Liu and R. B. Lee, “Random fill cache architecture,” in MICRO, 2014.
20.
S. Mangard, “Hardware countermeasures against dpa-a statistical analysis of their effectiveness,” in Cryptographers Track at the RSA Conference. Springer, 2004, pp. 222–235.
21.
F. P. Miller, Advanced Encryption Standard. Alpha Press, 2009.
22.
M. Neve and J.-P. Seifert, “Advances on access-driven cache attacks on aes,” in Selected Areas in Cryptography, vol. 4356. Springer, 2006, pp. 147–162.
23.
N. Nishikawa, “High-performance symmetric block ciphers on cuda,” in 2011 Second International Conference on Networking and Computing, Nov 2011, pp. 221–227.
24.
NVIDIA, “Programming Guide. ” Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz3DAGrtrOq.
25.
D. A. Osvik, “Cache attacks and countermeasures: The case of aes,” in Proceedings of the 2006 The Cryptographers Track at the RSA Conference on Topics in Cryptology, ser. CT-RSA06, 2006.
26.
D. Page, “Partitioned cache architecture as a side-channel defense mechanism,” in Cryptology ePrint Archive, Report 2005/280, 2005. Available: http://eprint.iacr.org/2005/280.pdf.
27.
R. D. Pietro, “Cuda leaks: A detailed hack for cuda and a (partial) fix,” ACM Trans. Embed. Comput. Syst., 2016.
28.
M. Rhu, “A Locality-Aware Memory Hierarchy for Energy-Efficient GPU Architectures,” in MICRO, 2013.
29.
T. G. Rogers, “Cache-Conscious Wavefront Scheduling,” in MICRO, 2012.
30.
T. G. Rogers, “Divergence-Aware Warp Scheduling,” in MICRO, 2013.

Contact IEEE to Subscribe

References

References is not available for this document.