One way to improve reliability in parallel computers consists of adding supplementary processors and interconnections to the functional structure in order to replace faulty processors with respect to the network structure. This approach is named structural fault tolerance (SFT). Very integrated parallel computers are one way to implement a parallel structure. The material structure is then composed of many elementary blocks, such as ASICs or multi-chip modules (MCMs), each containing many processors. We show that former SFT methods fail in combining the different features, constraints and requirements of such structures. Thus, this paper introduces a new reconfiguration approach that is dedicated to very integrated parallel computers
Published in:
Dependable Computing, 1999. Proceedings. 1999 Pacific Rim International Symposium on
Date of Conference: 1999