Skip to Main Content
Extreme CMOS scaling is expected to significantly impact the reliability of future microprocessors, prompting recent research effort on low-cost hardware-software cross-layer reliability solutions. To evaluate, statistical fault injection (SFI) is often used to estimate the error coverage of the underlying method. Unfortunately, because a significant number of errors injected by SFI are often derated, the evaluation becomes less rigorous and less efficient. This paper makes the observation that many derated errors can be gracefully avoided to allow the fault injection campaign to focus on likely non-derated faults that stress the method-under-test. We propose a biased injection framework called CriticalFault that employs vulnerability analysis to map out relevant faults for stress testing. With CriticalFault, our results show that the injection space is reduced by 29% and 59% of the biased injections cause either software aborts or silent data corruptions, both are improvements from SFI. Moreover, we characterize different propagation behaviors of these non-derated faults and discuss the implications of designing future cross-layer solutions. Overall, not only CriticalFault is highly effective in identifying relevant test cases for current systems, but reliability researchers and engineers can also conduct more in-depth and meaningful analysis in deveoping future reliability solutions using CriticalFault.