Skip to Main Content
N-modular redundancy (NMR) and N-version programming (NVP) are two popular fault tolerance techniques in which hardware and software redundancy is exploited to mask faults. Redundant hardware is used to improve fault tolerance rather than throughput. We introduce a scheme for combined hardware-software fault tolerance derived from NMR and NVP that shows how redundancy can also be used to improve throughput by grouping the execution of several tasks. Our scheme uses a dynamic task allocation algorithm with an optimistic execution policy where the number of task executions is kept close to the minimum required to produce fault-free results. For equivalent hardware and software resources, the proposed method is 50% to 100% more efficient in terms of throughput and latency.