By Topic

Lightweight Task Graph Inference for Distributed Applications

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Bin Xin ; Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA ; Eugster, P. ; Xiangyu Zhang ; Jinlin Yang

Recent paradigm shifts in distributed computing such as the advent of cloud computing pose new challenges to the analysis of distributed executions. One important new characteristic is that the management staff of computing platforms and the developers of applications are separated by corporate boundaries. The net result is that once applications go wrong, the most readily available debugging aids for developers are the visible output of the application and any log files collected during their execution. In this paper, we propose the concept of task graphs as a foundation to represent distributed executions, and present a low overhead algorithm to infer task graphs from event log files. Intuitively, a task represents an autonomous segment of computation inside a thread. Edges between tasks represent their interactions and preserve programmers' notion of data and control flows. Our technique leverages existing logging support where available or otherwise augments it with aspect-based instrumentation to collect events of a set of predefined types. We show how task graphs can improve the precision of anomaly detection in a request-oriented analysis of field software and help programmers understand the running of the Hadoop Distributed File System (HDFS).

Published in:

Reliable Distributed Systems, 2010 29th IEEE Symposium on

Date of Conference:

Oct. 31 2010-Nov. 3 2010