By Topic

The Effects of an ARMOR-based SIFT environment on the performance and dependability of user applications

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Whisnant, K. ; Sun MicroSysterms Inc., San Diego, CA, USA ; Iyer, R.K. ; Kalbarczyk, Z.T. ; Jones, P.H., III
more authors

Few, distributed software-implemented fault tolerance (SIFT) environments have been experimentally evaluated using substantial applications to show that they protect both themselves and the applications from errors. We present an experimental evaluation of a SIFT environment used to oversee spaceborne applications as part of the Remote Exploration and Experimentation (REE) program at the Jet Propulsion Laboratory. The SIFT environment is built around a set of self-checking ARMOR processes running on different machines that provide error detection and recovery services to themselves and to the REE applications. An evaluation methodology is presented in which over 28,000 errors were injected into both the SIFT processes and two representative REE applications. The experiments were split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment added negligible overhead to the application's execution time during failure-free runs. Correlated failures affecting a SIFT process and application process are possible, but the division of detection and recovery responsibilities in the SIFT environment allows it to recover from these multiple failure scenarios. Only 28 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes-coupled with object-based incremental checkpointing-were effective in preventing system failures by protecting dynamic data within the SIFT processes.

Published in:

Software Engineering, IEEE Transactions on  (Volume:30 ,  Issue: 4 )