Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Hui Li ; Leiden Inst. of Adv. Comput. Sci., Leiden Univ. ; Wolters, L.

This paper presents a comprehensive statistical analysis of workloads collected on data-intensive clusters and grids. The analysis is conducted at different levels, including virtual organization (VO) and user behavior. The aggregation procedure and scaling analysis are applied to job arrival processes, leading to the identification of several basic patterns, namely, pseudo-periodicity, long range dependence (LRD), and (multi)fractals. It is shown that statistical measures based on interarrivals are of limited usefulness and count based measures should be trusted instead when it comes to correlations. We also study workload characteristics like job run time, memory consumption, and cross correlations between these characteristics. A "bag-of-tasks" behavior is empirically proved, strongly indicating temporal locality. We argue that pseudo-periodicity, LRD, and "bag-of-tasks" behavior are important workload properties on data-intensive clusters and grids, which are not present in traditional parallel workloads. This study has important implications on workload modeling and performance predictions in data-intensive grid environments.

Published in:

Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International

Date of Conference:

26-30 March 2007