By Topic

Object-oriented stream programming using aspects

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Mingliang Wang ; NSF Center for Autonomic Comput., Rutgers Univ., Piscataway, NJ, USA ; Parashar, M.

High-performance parallel programs that efficiently utilize heterogeneous CPU+GPU accelerator systems require tuned coordination among multiple program units. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this paper, we propose a programming system based on the principles of Aspect-Oriented Programming, to un-clutter the code and to improve programmability of these heterogeneous parallel systems. Specifically, we use standard C++ to describe the core computations and aspects to encapsulate all other support parts. An aspect-weaving compiler is then used to combine these two pieces of code to generate a final program. The system modularizes concerns that are hard to manage using conventional programming frameworks such as CUDA, has a small impact on existing program structure as well as performance, and as a result, simplifies the programming of accelerator-based heterogeneous parallel systems. We also present an options pricing and an n-body simulation example program to demonstrate that programs written using this system can be successfully translated to CUDA programs for NVIDIA GPU hardware and to OpenCL programs for multicore CPUs with comparable performance. For both examples, the performance of the translated code achieved ~80% of the hand-coded CUDA programs.

Published in:

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Date of Conference:

19-23 April 2010