Faults, symptoms, and software fault tolerance in the TandemGUARDIAN90 operating system
Lee, I.; Iyer, R.K.
Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on
Volume , Issue , 22-24 Jun 1993 Page(s):20 - 29
Digital Object Identifier 10.1109/FTCS.1993.627304
Summary:The authors present a measurement-based study of software failures
and recovery in the Tandem GUARDIAN90 operating system using a
collection of memory dump analyses of field software failures. They
identify the effects of software faults on the processor state and trace
the propagation of the effects to other areas of the system. They also
evaluate the role of the defensive programming techniques and the
software fault tolerance of the process pair mechanism implemented in
the Tandem system. Results show that the Tandem system tolerates nearly
82% of reported field software faults, thus demonstrating the
effectiveness of the system against software faults. Consistency checks
made by the operating system detect 52% of software problems and prevent
any error propagation in 31% of software problems. Results also show
that 72% of reported field software failures are recurrences of known
software faults and 70% of the recurrence groups have identical
characteristics
View citation and abstract |