By Topic

Rollback and Recovery Strategies for Computer Programs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
K. M. Chandy ; Department of Computer Sciences, University of Texas, Austin, Tex. 78712. ; C. V. Ramamoorthy

Reliability is an important aspect of any system. On-line diagnosis, parity check coding, triple modular redundancy, and other methods have been used to improve the reliability of computing systems. In this paper another aspect of reliable computing systems is explored. The problem is that of recovering error-free information when an error is detected at some stage in the processing of a program. If an error or fault is detected while a program is being processed and if it cannot be corrected immediately, it may be necessary to run the entire program again. The time spent in rerunning the program may be substantial and in some real time applications critical. Recovery time can be reduced by saving states of the program (all the information stored in registers, primary and secondary storage, etc.) at intervals, as the processing continues. If an error is detected the program is restarted from its most recently saved state. However, a price is paid in saving a state in the form of time spent storing all the relevant information in secondary storage. Hence it is expensive to save the state of the program too often. Not saving any state of the program may cause an unacceptably large recovery time. The problem that we solve is the following. Determine the optimum points at which the state of the program should be stored to recover after any malfunction.

Published in:

IEEE Transactions on Computers  (Volume:C-21 ,  Issue: 6 )