Cart (Loading....) | Create Account
Close category search window
 

Efficient rollback-recovery technique in distributed computing systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ge-Ming Chiu ; Dept. of Electr. Eng. & Technol., Nat. Taiwan Inst. of Technol., Taipei, Taiwan ; Cheng-Ru Young

We propose an approach for implementing rollback recovery in a distributed computing system. A concept of logical ring is introduced for the maintenance of information required for consistent recovery from a system crash. Message processing order of a process is kept by all other processes on its logical ring. Transmission of data messages are accompanied by the circulation of the associated order messages on the ring. The sizes of the order messages are small. In addition, redundant transmission of order information is avoided, thereby reducing the communication overhead incurred during failure free operation. Furthermore, updating of the order information and garbage collection task are simplified in the proposed mechanism. Our approach does not require information about message processing order be written to stable storage; in fact, the time consuming operations of saving information in stable storage are confined to the checkpointing activities. When failures occur, a surviving process need roll back only if some preceding order information is totally lost, which is relatively unlikely considering the ever growing speed of communication networks. It is shown that a system can recover correctly as long as there exists at least one surviving process

Published in:

Parallel and Distributed Systems, IEEE Transactions on  (Volume:7 ,  Issue: 6 )

Date of Publication:

Jun 1996

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.