The MAFT architecture for distributed fault tolerance
Keichafer, R.M.; Walter, C.J.; Finn, A.M.; Thambidurai, P.M.
Computers, IEEE Transactions on
Volume 37, Issue 4, Apr 1988 Page(s):398 - 404
Digital Object Identifier 10.1109/12.2183
Summary:A description is given of the multicomputer architecture for fault
tolerance (MAFT), a distributed system designed to provide extremely
reliable computation in real-time control systems. MAFT is based on the
physical and functional partitioning of executive functions from
applications functions. The implementation of the executive functions in
a special-purpose hardware processor allows the fault-tolerance
functions to be transparent to the application programs and minimizes
overhead. Byzantine agreement and approximate agreement algorithms are
used for critical system parameters. MAFT supports the use of
multiversion hardware and software to tolerate built-in or generic
faults. Graceful degradation and restoration of the application workload
is permitted in response to the exclusion and readmission of nodes,
respectively
View citation and abstract |