1 Introduction
The evolution of General-Purpose Processing on GPU (GPGPU) has been aided by the emergence of parallel programming frameworks, especially the Compute Unified Device Architecture (CUDA) [8, 27] and Open Computing Language (OpenCL) [6, 13]. These tools allow programmers to easily use the hardware resources of massively parallel processors for their applications, processing large amounts of data in relatively shorter periods of time when compared to previous Central Processing Unit (CPU)-based architectures. However, although GPUs provide high throughput performance, o-the-shelf devices demand high power levels to operate (200W to 300W per device) and have fixed designs that cannot be adapted towards the specific needs of the target applications.