Loading [MathJax]/extensions/MathMenu.js
GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems | IEEE Conference Publication | IEEE Xplore

GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems


Abstract:

Performance variance is one of the nasty pitfalls of large-scale heterogeneous systems, which can lead to unexpected and unpredictable performance degradation for paralle...Show More

Abstract:

Performance variance is one of the nasty pitfalls of large-scale heterogeneous systems, which can lead to unexpected and unpredictable performance degradation for parallel programs. Such performance issues typically arise from various random hardware and software faults, making it exceedingly difficult to pinpoint the exact causes of performance variance in specific instances. In this paper, we propose GVARP, a performance variance detection tool for large-scale heterogeneous systems. GVARP employs static analysis to identify the performancecritical parameters of kernel functions. Additionally, GVARP segments the program execution with external library calls and asynchronous kernel operations. Then GVARP constructs a state transfer graph and estimates the workload of each program segment to identify and cluster instances of similar workloads, facilitating the detection of performance variance. Our evaluation results demonstrate that GVARP effectively detects performance variance at a large scale with acceptable overhead and provides intuitive insights to locate the sources of performance variance.
Date of Conference: 17-22 November 2024
Date Added to IEEE Xplore: 24 December 2024
ISBN Information:
Conference Location: Atlanta, GA, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.