Abstract:
In this paper, we propose a framework to automatically map single-device OpenCL programs to heterogeneous multi-device platforms with performance concerns. Our framework ...Show MoreMetadata
Abstract:
In this paper, we propose a framework to automatically map single-device OpenCL programs to heterogeneous multi-device platforms with performance concerns. Our framework is based on the independence of work groups which built inside the OpenCL programming model and relies heavily on the knowledge of global memory access regions of work groups. So global memory access patterns of work groups are analyzed and an abstract representation CCRwS is designed to describe the exact memory access regions of each memory access statement in the kernels. A global memory access analyzer is designed to get CCRwSs by performing static program analysis on kernel codes. Based on CCRwSs, data transfer between multiple devices and host can be fully controlled by our framework. Then a kernel code regenerator is designed to distribute the workload and perform architecture specific optimizations by code transformation. Then we tested our framework on a platform with 2 Intel E5-2650 CPUs and 4 NVIDIA Tesla C2050 GPUs. Compared with the performance on single GPU, the kernels running on all the 6 devices can achieve about 4.5x faster.
Date of Conference: 13-15 November 2013
Date Added to IEEE Xplore: 12 June 2014
Electronic ISBN:978-0-7695-5088-6