Several different models for predicting coverage in a fault-tolerant system, including models for permanent, intermittent, and transient errors, are discussed. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing coverage are developed. Two types of events that interfere with recovery are examined; and methods for modeling such events, whether they are deterministic or random, are given. The sensitivity of system reliability/availability to the coverage parameter and the sensitivity of the coverage parameter to various error-handling strategies are investigated. It is found that a policy of attempting transient recovery upon detection of an error (as opposed to automatically reconfiguring the affected component out of the system) can actually increase the unreliability of the system
Published in:
Computers, IEEE Transactions on
(Volume:38
,
Issue:
6
)
Date of Publication: Jun 1989