By Topic

Lightweight Online Performance Monitoring and Tuning with Embedded Gossip

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Zhu, J.W. ; Xilinx, Inc., Albuquerque, NM ; Bridges, P.G. ; Maccabe, A.

Understanding and tuning the performance of large-scale long-running applications is difficult, with both standard trace-based and statistical methods having substantial shortcomings that limit their usefulness. This paper describes a new performance monitoring approach called Embedded Gossip (EG) designed to enable lightweight online performance monitoring and tuning. EG works by piggybacking performance information on existing messages and performing information correlation online, giving each process in a parallel application a weakly consistent global view of the behavior of the entire application. To demonstrate the viability of EG, this paper presents the design and experimental evaluation of two different online monitoring systems and an online global adaptation system driven by Embedded Gossiping. In addition, we present a metric system for evaluating the suitability of an application to EG-based monitoring and adaptation, a general architecture for implementing EG-based monitoring systems, and a modified global commit algorithm appropriate for use in EG-based global adaptation systems. Together, these results demonstrate that EG is an efficient low-overhead approach for addressing a wide range of parallel performance monitoring tasks and that results from these systems can effectively drive online global adaptation.

Published in:

Parallel and Distributed Systems, IEEE Transactions on  (Volume:20 ,  Issue: 7 )