By Topic

An efficient coordinated checkpointing scheme for multicomputers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Sharma, D.D. ; Hewlett-Packard Co., Roseville, CA, USA ; Pradhan, D.K.

A new approach for checkpointing multicomputer applications is presented. The checkpointing is initiated and controlled by a checkpoint coordinator, residing either on one of the nodes running the application or on the host processor attached to the multicomputer. A message count is used to determine if any messages are in transit. The proposed strategy is hardware-independent and can be implemented in any multicomputer system irrespective of the architecture, interconnection, and routing strategy. This scheme can be used for FIFO and non-FIFO channels as well as with channels where messages can be lost. Measurement results obtained from our simulations indicate that the proposed strategy outperforms an existing scheme proposed for fixed-path wormhole-routed multicomputer systems. Although the proposed strategy is targeted for high-performance, massively parallel multicomputers, it can also be used in any general-purpose distributed system to improve the checkpointing overhead

Published in:

Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on

Date of Conference:

12-14 Jun 1994