In this paper we propose a new approach to fault-tolerant multiprocessor scheduling by exploiting implicit redundancy, which is originally introduced by task duplication. In the new scheduling algorithm, we adopt two strategies: (1) Some processing elements (PEs) are reserved only for realizing fault-tolerance, and thus are not used for original task scheduling (reserved-scheduling). (2) A set of tasks is partitioned into several disjoint small subsets, and to each subset the algorithm is applied incrementally (phased-scheduling). By this unique device, toe can ensure that the finish times of schedules are small even in tile case of a single PE failure. Then we apply the new scheduling algorithm to practical task graphs (LU-decomposition and Laplace equation solver). The experimental results show that the obtained schedules can tolerate a single PE failure at the cost of small degree of time redundancy.
Published in:
Fault-Tolerant Computing, 1997. FTCS-27. Digest of Papers., Twenty-Seventh Annual International Symposium on
Date of Conference: 24-27 June 1997