Fault tolerance in commercial computers
Siewiorek, D.P.
Computer
Volume 23, Issue 7, Jul 1990 Page(s):26 - 37
Digital Object Identifier 10.1109/2.56850
Summary:A taxonomy of fault tolerance in commercial computers is set
forth. It is organized around three orthogonal axes: the sources of
errors the computer tolerates, the computer's approach to tolerating
errors, and the computer's structure. Each of these is briefly
discussed. An example of each class in the taxonomy is presented, as
well as its approach to answering the following questions: (1) Is the
system to be highly reliable or highly available? (2) Do all outputs
have to be correct, or only data committed to long-term storage? (3) How
familiar must the user be with the architecture and software redundancy?
(4) Is the system dedicated so that attributes of the application can be
used to simplify fault tolerance techniques? (5) Is the system
constrained to use existing components? (6) Even if the design is new,
what cost and/or performance penalty does it impose on the user who does
not require fault tolerance? (7) Is the system stand-alone, or can other
processors be called upon to assist in times of failure? The computers
covered are the VAX 8600 and IBM 3090 uniprocessors, the Tandem,
Stratus, and VAXft 3000 multicomputers, and the Teradata and Sequoia
multiprocessors
View citation and abstract |