In recent years, various attempts have been made to combine software and hardware fault tolerance in critical computer systems. In these systems, software and hardware faults may occur in many different sequences. The problem of how to incorporate these sequences in a fault-tolerant system design has been largely neglected in previous work. In this paper, a uniform software-based fault tolerance method is proposed to distinguish and tolerate various sequences of software and hardware faults. This method can be based on the recovery block or the n-version programming scheme, two fundamental software fault tolerance schemes. The concept of a fault identification and system reconfiguration (FISR) tree is proposed to represent the procedure of fault identification and system reconfiguration in a systematic way
Published in:
Parallel and Distributed Processing, 1993. Proceedings. Euromicro Workshop on
Date of Conference: 27-29 Jan 1993