By Topic

Concurrent robust checkpointing and recovery in distributed systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Leu, P.-J. ; Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA ; Bhargava, B.

A checkpoint/rollback algorithm is presented for multiple processes in a distributed system that uses message passing for communication. Each process in the system can initiate the algorithm autonomously. If only one instance of the algorithm is being executed, the algorithm will force the minimal number of additional processes other than the initiator to make checkpoints (or roll back). The contributions of this research are as follows: (1) the concurrent execution of the algorithm for different global checkpointing instances and rollback instances initiated by several processes is allowed. Deadlocks or livelocks among different global checkpointing instances and rollback instances will not occur; (2) the algorithm is resilient to multiple process failures, and handles network partitioning in a pessimistic way, and (3) the algorithm does not require that messages be received in the order in which they are sent

Published in:

Data Engineering, 1988. Proceedings. Fourth International Conference on

Date of Conference:

1-5 Feb 1988