Skip to Main Content
Grid computing proves to be a successful paradigm for large-scale distributed data processing, and global eScience grids have been in production for years (e.g., LCG and OSG). The majority of applications running on these production environments can be characterized as massive CPU-intensive batch jobs (or ??bag-of-tasks??), sometimes considered as the ??killer?? application for the grid. A deep understanding of its main workload characteristics is not only necessary for realistic performance evaluation of the existing system, but also crucial to generate new insights into better resource allocation schemes. This paper presents a comprehensive statistical analysis of the workloads on production eScience grid environments. We focus on second-order statistics and the scaling behavior of main job characteristics, namely job arrivals and job runtimes. A range of autocorrelation structures is identified and analyzed, including pseudoperiodicity, short-range dependence (SRD), and long-range dependence (LRD). We further develop mathematical models that are able to capture these salient properties in the workloads. Workload models, in turn, enable us to quantitatively evaluate the performance impacts of autocorrelations in grid scheduling. The results indicate that autocorrelations in workloads result in system performance degradation, sometimes the difference can be as large as up to several orders of magnitude. Nevertheless, better performance can be achieved at the grid level under bursty local background workloads. Such effects of workloads on systems are extensively analyzed and explained.
Parallel and Distributed Systems, IEEE Transactions on (Volume:21 , Issue: 4 )
Date of Publication: April 2010