The problem of system recovery from transient faults is addressed using retry techniques. A probabilistic model for the activity of faulty periods, and a fault analysis to derive the optimum retry period are presented. Distribution functions are derived to represent the case of false alarm, where a transient fault is flagged as permanent, and the case of a miss because too many faults coexist, overcoming the checker's capability to detect them. These derivations are compared with the results of a simulation program representing the model. Other factors influencing the value of the retry period are discussed
Published in:
Reliability, IEEE Transactions on
(Volume:37
,
Issue:
3
)
Date of Publication: Aug 1988