Skip to Main Content
The future of high performance computing is moving towards exa-scale computing. Graphical Processing Units (GPUs) have demonstrated their capabilities beyond graphics rendering or general purpose computing and are well suited for data intensive applications. However, the communication bottleneck for data transfer between the GPU and CPU has led to the design of AMD's Accelerated Processing Unit (APU) which combines the CPU and GPU on a single chip. This new architecture poses new challenges: algorithms must be redesigned to take advantage of this architecture and programming models differ between vendors, hindering the portability of algorithms across heterogeneous platforms. Recently, OpenCL has been regarded as the standard programming model for heterogeneous platforms. With the future of general purpose computing moving towards APUs, in this paper, we study the design and implementation of two problems: 0-1 knapsack and Gaussian Elimination in OpenCL. This pair of algorithms showcases similar synchronization behaviors, enabling a more direct comparison. We discuss the design and performance of these algorithms using OpenCL.