Manivannan, D.
Singhal, M.
Dept. of Comput. Sci., Kentucky Univ., Lexington, KY;
This paper appears in: Parallel and Distributed Systems, IEEE Transactions on
Publication Date: Jul 1999
Volume: 10,
Issue: 7
On page(s): 703-713
ISSN: 1045-9219
References Cited: 24
CODEN: ITDSEO
INSPEC Accession Number: 6329500
Digital Object Identifier: 10.1109/71.780865
Posted online: 2002-08-06 22:35:32.0
Abstract
Checkpointing algorithms are classified as synchronous and
asynchronous in the literature. In synchronous checkpointing, processes
synchronize their checkpointing activities so that a globally consistent
set of checkpoints is always maintained in the system. Synchronizing
checkpointing activity involves message overhead and process execution
may have to be suspended during the checkpointing coordination,
resulting in performance degradation. In asynchronous checkpointing,
processes take checkpoints without any coordination with others.
Asynchronous checkpointing provides maximum autonomy for processes to
take checkpoints; however, some of the checkpoints taken may not lie on
any consistent global checkpoint, thus making the checkpointing efforts
useless. Asynchronous checkpointing algorithms in the literature can
reduce the number of useless checkpoints by making processes take
communication induced checkpoints besides asynchronous checkpoints. We
call such algorithms quasi-synchronous. In this paper, we present a
theoretical framework for characterizing and classifying such
algorithms. The theory not only helps to classify and characterize the
quasi-synchronous checkpointing algorithms, but also helps to analyze
the properties and limitations of the algorithms belonging to each
class. It also provides guidelines for designing and evaluating such
algorithms
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.