By Topic

Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Guibin Wang ; Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China ; Tao Tang ; Xudong Fang ; Xiaoguang Ren

Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. As the floating-point computing capacity increases continually, the problem of ``memory-wall'' becomes more serious, especially for array-intensive applications. In this paper, we optimize and implement two SPEC2k benchmarks mgrid and swim on multithreaded GPU using CUDA and Brook+. In order to reduce the pressure on off-chip memory, we make use of data locality in multi-level memory hierarchies and hide long memory access latency via double-buffers. To balance inter-thread parallelism and intra-thread locality, we further tune thread granularity for each kernel and empirically study the best equilibrium point for this problem. Flow control instruction can significantly impact the effective instruction throughput. Oriented to this problem, we introduce a diverge elimination technology to convert condition expression into computing operation. Through all the optimizations, we gain the speedup of 10×-34× to the CPU implementation on the GPUs of AMD and NVIDIA respectively. Finally, we summarize and compares the GPUs from AMD and NVIDIA in hardware and software.

Published in:

Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on

Date of Conference:

8-11 Dec. 2009