By Topic

Event Logging: Portable and Efficient Checkpointing in Heterogeneous Environments with Non-FIFO Communication Platforms

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Z. Peng ; Dept. of Comput. Sci., Univ. Coll. Dublin, Ireland ; A. Lastovetsky

The Chandy-Lamport checkpointing algorithm is widely used in fault tolerant implementations of MPI. However, it assumes the FIFO property of message passing, which is not guaranteed by the MPI standard at the application level. Therefore, this algorithm cannot serve as a basis for an implementation-independent fault tolerant MPI. In this paper, we present a variant of the Chandy-Lamport algorithm that does not rely on the FIFO property. This algorithm can be implemented on top of MPI and, hence, used for development of a supplement software component enabling the fault tolerance of any MPI implementation compliant with the MPI standard. We prove the correctness of the algorithm and analyze its performance. Experimental results demonstrating the efficiency of the algorithm are also presented.

Published in:

19th IEEE International Parallel and Distributed Processing Symposium

Date of Conference:

04-08 April 2005