Cart (Loading....) | Create Account
Close category search window
 

A performance monitor based on virtual global time for clusters of PCs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Taufer, M. ; Dept. of CSE, Califonia Univ., San Diego, CA, USA ; Stricker, T.

Debugging the performance of parallel and distributed systems remains a difficult task despite the widespread use of middleware packages for automatic distribution, communication and tasking in clusters. In this paper we present a performance monitoring tool for clusters of PCs that is based on the simple concept of accounting for resource usage and on the simple idea of mapping all performance related state of hardware performance counters and operating system variables backwards to the application level. In this way a monitoring tool can explain the most relevant performance metrics at a higher level that is easily understood by the application developer. The most important metric for distributed high performance applications remains the total execution time vs. the number of compute nodes involved, since it translates into the scalability of an application. As a detailed contribution of this paper, we closely look into what is needed to reverse map the low level performance counters at each node back through the middleware layer responsible for the parallelization and distribution. The specific problems encountered and dealt with are the creation of a flexible notion of global time for time-stamping and the reassembling of performance data and an appropriate communication mechanism to minimize monitoring intrusion due to the additional networking traffic caused by the monitor. We show how our tool can be used to measure, explain and predict the performance and scalability of a distributed OLAP application running on clusters of PCs.

Published in:

Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on

Date of Conference:

1-4 Dec. 2003

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.