Spare processors in a processor array are usually idle in normal operation. They are used only after a fault is detected through periodic or on-line diagnosis and the processor array is reconfigured to include them. In this paper we propose a design methodology in which the spare processors are used to aid with a data-driven error detection scheme. Our method consists of attaching tags to data streams, thereby allowing the data items to carry their own control information. A checking processor changes the tags when it detects a disagreement among replicated computation results. The faulty processor can then be located by error information derived from two distinct data streams. We incorporate the techniques using space and time redundancy into a fault-tolerant processor array that can provide different levels of fault tolerance according to the availability of fault-free processors. The scheme is also flexible in that it can trade error detection capability for added computational throughput
Published in:
Algorithms & Architectures for Parallel Processing, 1996. ICAPP 96. 1996 IEEE Second International Conference on
Date of Conference: 11-13 Jun 1996