Skip to Main Content
As commercial off-the-shelf (COTS) components are used in system-on-chip (SoC) design technique that is widely used from cellular phones to personal computers, it is difficult to modify hardware design to implement hardware fault-tolerant techniques and improve system reliability. Two major concerns of this paper are to: (a) improve system reliability by detecting transient errors in hardware, and (b) reduce energy consumption by minimizing error-detection overhead. The objective of this new technique, selective procedure call duplication (SPCD) is to keep the system fault-secured (preserve data integrity) in the presence of transient errors, with minimum additional energy consumption. The basic approach is to duplicate computations and then to compare their results to detect errors. There are 3 choices for duplicate computation: (1) duplicating every statement in the program and comparing results, (2) re-executing procedures through duplicated procedure calls, and comparing results, and (3) re-executing the whole program, and comparing the final results. SPDC combines choices (1) and(2). For a given program, SPCD analyzes procedure-call behavior of the program, and then determines which procedures can have duplicated statements [choice(1)] and which procedure calls can be duplicated [choice (2)] to minimize energy consumption with reasonable error-detection latency. Then, SPCD transforms the original program into a new program that can detect errors with minimum additional energy consumption by re-executing the statements or procedures. SPCD was simulated with benchmark programs; it requires less than 25% additional energy for error detection than previous techniques that do not consider energy consumption.