Conferences >2019 International Conference...

Improving GPGPU Performance Using Efficient Scheduling

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Graphics processing units (GPUs) are in demand for executing general purpose parallel applications because of their massive computational power and latency-hiding capabil...Show More

Metadata

Abstract:

Graphics processing units (GPUs) are in demand for executing general purpose parallel applications because of their massive computational power and latency-hiding capability. GPU programming models like CUDA and OPENCL allow programmer to create N number of threads for a kernel. Parallelism is achieved by grouping the threads into fixed size warps and executing many warps simultaneously on a GPU core. In spite of parallel execution the computational resource on GPU cores are still underutilized which degrades the overall performance of GPUs. One of the reasons for this is time disparity in execution of thread block among warps. To improve GPU performance, these resources must be more efficiently utilized by minimizing this time disparity. To eliminate the disadvantages of conventional LRR policy we propose a Lazy Warp Scheduler (LWS). The round robin warp scheduler is inefficient in overlapping the latency stalls. We observed time disparity in execution among warps of same Cooperative Thread Array (CTA) which caused degradation in performance due to the lazy warps. A lazy warp is the warp in the thread block which is taking more time to get executed leading to time disparity. This is due to one of the following reasons like branch divergence, imbalanced workload in parallel warps, memory contention, irregular memory access pattern etc. To overcome this problem, we propose the LWS which classifies the warps into two categories, lazy warp (LW) and fast warp (FW). It schedules the warps of FW categories first giving high priority followed by lazy warps. We used counters to identify which warp is taking more time i.e. lazy warp. Further we proposed LW aware CTA scheduling in which CTA is scheduled on SM on the basis of behavior of warps in a CTA. Our evaluation result shows that LWS can improve the performance of GPGPU applications by 24 % as compared to loose round robin scheduling policy.

Published in: 2019 International Conference on Intelligent Sustainable Systems (ICISS)

Date of Conference: 21-22 February 2019

Date Added to IEEE Xplore: 21 November 2019

ISBN Information:

DOI: 10.1109/ISS1.2019.8908051

Conference Location: Palladam, India

Contents

References is not available for this document.

Improving GPGPU Performance Using Efficient Scheduling

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Improving GPGPU Performance Using Efficient Scheduling

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?