Improving GPGPU Performance Using Efficient Scheduling | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 30 June, IEEE Xplore will undergo scheduled maintenance from 1:00-2:00 PM ET (1800-1900 UTC).
On Tuesday, 1 July, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC).
During these times, there may be intermittent impact on performance. We apologize for any inconvenience.

Improving GPGPU Performance Using Efficient Scheduling


Abstract:

Graphics processing units (GPUs) are in demand for executing general purpose parallel applications because of their massive computational power and latency-hiding capabil...Show More

Abstract:

Graphics processing units (GPUs) are in demand for executing general purpose parallel applications because of their massive computational power and latency-hiding capability. GPU programming models like CUDA and OPENCL allow programmer to create N number of threads for a kernel. Parallelism is achieved by grouping the threads into fixed size warps and executing many warps simultaneously on a GPU core. In spite of parallel execution the computational resource on GPU cores are still underutilized which degrades the overall performance of GPUs. One of the reasons for this is time disparity in execution of thread block among warps. To improve GPU performance, these resources must be more efficiently utilized by minimizing this time disparity. To eliminate the disadvantages of conventional LRR policy we propose a Lazy Warp Scheduler (LWS). The round robin warp scheduler is inefficient in overlapping the latency stalls. We observed time disparity in execution among warps of same Cooperative Thread Array (CTA) which caused degradation in performance due to the lazy warps. A lazy warp is the warp in the thread block which is taking more time to get executed leading to time disparity. This is due to one of the following reasons like branch divergence, imbalanced workload in parallel warps, memory contention, irregular memory access pattern etc. To overcome this problem, we propose the LWS which classifies the warps into two categories, lazy warp (LW) and fast warp (FW). It schedules the warps of FW categories first giving high priority followed by lazy warps. We used counters to identify which warp is taking more time i.e. lazy warp. Further we proposed LW aware CTA scheduling in which CTA is scheduled on SM on the basis of behavior of warps in a CTA. Our evaluation result shows that LWS can improve the performance of GPGPU applications by 24 % as compared to loose round robin scheduling policy.
Date of Conference: 21-22 February 2019
Date Added to IEEE Xplore: 21 November 2019
ISBN Information:
Conference Location: Palladam, India

Contact IEEE to Subscribe

References

References is not available for this document.