Skip to Main Content
Transient or soft errors caused by various environmental effects are a growing concern in micro and nanoelectronics. We present a general framework for modeling and mitigating the logical effects of such errors in digital circuits. We observe that some errors have time-bounded effects; the system's output is corrupted for a few clock cycles, after which it recovers automatically. Since such erroneous behavior can be tolerated by some applications, i.e., it is noncritical at the system level, we define the critical soft error rate (CSER) as a more realistic alternative to the conventional SER measure. A simplified technology-independent fault model, the single transient fault (STF), is proposed for efficiently estimating the error probabilities associated with individual nodes in both combinational and sequential logic. STFs can be used to compute various other useful metrics for the faults and errors of interest, and the required computations can leverage the large body of existing methods and tools designed for (permanent) stuck-at faults. As an application of the proposed methodology, we introduce a systematic strategy for hardening logic circuits against transient faults. The goal is to achieve a desired level of CSER at minimum cost by selecting a subset of nodes for hardening against STFs. Exact and approximate algorithms to solve the node selection problem are presented. The effectiveness of this approach is demonstrated by experiments with the ISCAS-85 and -89 benchmark suites, as well as some large (multimillion-gate) industrial circuits.