By Topic

Using checkpoints to localize the effects of faults in distributed systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ahamad, M. ; Sch. of Inf. & Comput. Sci., Georgia Inst. of Technol., Atlanta, GA, USA ; Lin, L.

A checkpointing scheme can be used to ensure forward progress of a computation (program) even when failures occur. In a distributed system, many autonomous programs can execute concurrently and obtain services from a set of shared servers. In such a system, it is desirable to to restrict a checkpoint or rollback operation to a single program to localize the effects of failures, even when processes of different programs communicate with servers. This can be achieved by a scheme based on message logging and consistent checkpoints when the system is deterministic. When the system (communication network or programs) is nondeterministic, the semantics of the server functions should be exploited to reduce the additional synchronization that needs to be introduced to ensure locality. The authors illustrate this by presenting efficient algorithms for a file server that do not require the logging of messages on stable storage

Published in:

Reliable Distributed Systems, 1989., Proceedings of the Eighth Symposium on

Date of Conference:

10-12 Oct 1989