Skip to Main Content
High-performance parallel programs that efficiently utilize heterogeneous CPU+GPU accelerator systems require tuned coordination among multiple program units. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this paper, we propose a programming system based on the principles of Aspect-Oriented Programming, to un-clutter the code and to improve programmability of these heterogeneous parallel systems. Specifically, we use standard C++ to describe the core computations and aspects to encapsulate all other support parts. An aspect-weaving compiler is then used to combine these two pieces of code to generate a final program. The system modularizes concerns that are hard to manage using conventional programming frameworks such as CUDA, has a small impact on existing program structure as well as performance, and as a result, simplifies the programming of accelerator-based heterogeneous parallel systems. We also present an options pricing and an n-body simulation example program to demonstrate that programs written using this system can be successfully translated to CUDA programs for NVIDIA GPU hardware and to OpenCL programs for multicore CPUs with comparable performance. For both examples, the performance of the translated code achieved ~80% of the hand-coded CUDA programs.