Loading [a11y]/accessibility-menu.js
Improving GPGPU Performance Using Efficient Scheduling | IEEE Conference Publication | IEEE Xplore

Improving GPGPU Performance Using Efficient Scheduling


Abstract:

Graphics processing units (GPUs) are in demand for executing general purpose parallel applications because of their massive computational power and latency-hiding capabil...Show More

Abstract:

Graphics processing units (GPUs) are in demand for executing general purpose parallel applications because of their massive computational power and latency-hiding capability. GPU programming models like CUDA and OPENCL allow programmer to create N number of threads for a kernel. Parallelism is achieved by grouping the threads into fixed size warps and executing many warps simultaneously on a GPU core. In spite of parallel execution the computational resource on GPU cores are still underutilized which degrades the overall performance of GPUs. One of the reasons for this is time disparity in execution of thread block among warps. To improve GPU performance, these resources must be more efficiently utilized by minimizing this time disparity. To eliminate the disadvantages of conventional LRR policy we propose a Lazy Warp Scheduler (LWS). The round robin warp scheduler is inefficient in overlapping the latency stalls. We observed time disparity in execution among warps of same Cooperative Thread Array (CTA) which caused degradation in performance due to the lazy warps. A lazy warp is the warp in the thread block which is taking more time to get executed leading to time disparity. This is due to one of the following reasons like branch divergence, imbalanced workload in parallel warps, memory contention, irregular memory access pattern etc. To overcome this problem, we propose the LWS which classifies the warps into two categories, lazy warp (LW) and fast warp (FW). It schedules the warps of FW categories first giving high priority followed by lazy warps. We used counters to identify which warp is taking more time i.e. lazy warp. Further we proposed LW aware CTA scheduling in which CTA is scheduled on SM on the basis of behavior of warps in a CTA. Our evaluation result shows that LWS can improve the performance of GPGPU applications by 24 % as compared to loose round robin scheduling policy.
Date of Conference: 21-22 February 2019
Date Added to IEEE Xplore: 21 November 2019
ISBN Information:
Conference Location: Palladam, India

I. Introduction

Graphics processing units (GPUs) have been present in the computing industry for over 40 years. Their special design allows us to perform many operations simultaneously. However, the resources on the GPU are limited and some operations take much more time than the others. In order to manage the execution of the warps, a scheduler should keep list of warps that can be issued at any time and issue them such that it maximizes the resources utilization and performance of GPU.

Contact IEEE to Subscribe

References

References is not available for this document.