Performance and reliability are two of the most crucial issues in today's high-performance instrumentation and measurement systems. Instrumentation and measurement systems have found and enjoyed their performance enhancement through parallel and distributed processing. High speed and density Multistage Interconnection Networks (MINs) are widely-used subsystems of parallel processing and communication systems. New performance models are proposed to evaluate the fault tolerant MIN in this paper, thereby establishing a sound foundation for assuring the performance and reliability of fault tolerant MINs with high confidence level during parallel instrumentation. A concurrent fault detection and recovery scheme for MINs is introduced to enable a generic approach to fault tolerance by rerouting over the redundant interconnection links. A switch architecture to realize the concurrent testing and diagnosis is shown. The proposed performance models are developed and used to evaluate the compound effect of the fault tolerant operations such as testing, diagnosis and recovery on the throughput and delay. Results are shown on single transient and permanent stuck-at fault on links and storage units in switching elements. it is shown that the performance degradation for the overhead due to the fault tolerance is quite graceful while the performance degradation without fault recovery is unacceptable.
Published in:
Instrumentation and Measurement Technology Conference, 2002. IMTC/2002. Proceedings of the 19th IEEE
(Volume:2
)
Date of Conference: 2002