By Topic

Reducing recovery time in a small recursively restartable system

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Candea, G. ; Stanford Univ., CA, USA ; Cutler, J. ; Fox, A. ; Doshi, R.
more authors

We present ideas on how to structure software systems for high availability by considering MTTR/MTTF characteristics of components in addition to the traditional criteria, such as functionality or state sharing. Recursive restartability (RR), a recently proposed technique for achieving high availability, exploits partial restarts at various levels within complex software infrastructures to recover from transient failures and rejuvenate software components. Here we refine the original proposal and apply the RR philosophy to Mercury, a COTS-based satellite ground station that has been in operation for over 2 years. We develop three techniques for transforming component group boundaries such that time-to-recover is reduced, hence increasing system availability. We also further RR by defining the notions of an oracle, restart group and restart policy, while showing how to reason about system properties in terms of restart groups. From our experience with applying RR to Mercury, we draw design guidelines and lessons for the systematic application of recursive restartability to other software systems amenable to RR.

Published in:

Dependable Systems and Networks, 2002. DSN 2002. Proceedings. International Conference on

Date of Conference:

2002