Algorithm-based fault-tolerance has been used for a number of years in the field of numerical processing. It has advantages over more `explicit' fault-tolerant methods in that it operates concurrently with the application, thus reducing the time overhead associated with the added redundancy. Recovery blocks and similar fault-tolerant methods are critically dependent on the detection of errors in the system (as are all fault-tolerant methods). In the recovery block scheme, this error detection is performed by some form of acceptability check on the resultant data. This is usually a non-trivial problem and one of the major issues that prevent recovery block schemes being used more widely. This paper describes how algorithm-based fault-tolerant methods could be used to assist in the error detection process within the recovery block scheme and thus make it more appropriate for use in `real' applications
Published in:
EUROMICRO 96. Beyond 2000: Hardware and Software Design Strategies., Proceedings of the 22nd EUROMICRO Conference
Date of Conference: 2-5 Sep 1996