Skip to Main Content
Current technology trends favor hybrid architectures, typically with each node in a cluster containing both general-purpose and specialized accelerator processors. The typical model for programming such systems is host-centric: The general-purpose processor orchestrates the computation, offloading performance-critical work to the accelerator, and data are communicated only among general-purpose processors. In this paper, we propose a radically different hybrid-programming approach, which we call the reverse-acceleration model. In this model, the accelerators orchestrate the computation, offloading work that cannot be accelerated to the general-purpose processors. Data is communicated among accelerators, not among general-purpose processors. Our thesis is that the reverse-acceleration model simplifies porting codes to hybrid systems and facilitates performance optimization. We present a case study of a legacy neutron-transport code that we modified to use reverse acceleration and ran across the full 122,400 cores (general-purpose plus accelerator) of the Los Alamos National Laboratory Roadrunner supercomputer. Results indicate a substantial performance improvement over the unaccelerated version of the code.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.