Skip to Main Content
Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPU-GPU heterogeneous platform. In this paper, we choose SWIM, a typical memory intensive application from the SPEC OMP 2001 benchmark suite, for case study. We attempt to optimize the performance and energy consumption of the application utilizing different memory access mechanisms and present optimization methods including matrix transposition and kernel fusion. The experimental results on the Intel Core TM i920 CPU plus GeForce GTX 295 platform shows that, the proposed optimizing methods achieve a speedup of 8.7X over the original OpenMP program and reduce the energy consumption by 83% for the problem size of 2048*2048.