Skip to Main Content
MapReduce is a programming model that enables efficient massive data processing in large-scale computing environments such as supercomputers and clouds. Such large-scale computers employ GPUs to enjoy its good peak performance and high memory bandwidth. Since the performance of each job is depending on running application characteristics and underlying computing environments, scheduling MapReduce tasks onto CPU cores and GPU devices for efficient execution is difficult. To address this problem, we have proposed a hybrid scheduling technique for GPU-based computer clusters, which minimizes the execution time of a submitted job using dynamic profiles of Map tasks running on CPU cores and GPU devices. We have implemented a prototype of our proposed scheduling technique by extending MapReduce framework, Hadoop. We have conducted some experiments for this prototype by using a K-means application as a benchmark on a supercomputer. The results show that the proposed technique achieves 1.93 times faster than the Hadoop original scheduling algorithm at 64 nodes (1024 CPU cores and 128 GPU devices). The results also indicate that the performance of map tasks, including both CPU and GPU tasks, is significantly affected by the overhead of map task invocation in the Hadoop framework.