Skip to Main Content
Soft error reliability is increasingly becoming a first-order design concern for microprocessors, as a result of higher transistor counts, shrinking device geometries and lowering of operating voltages. It is important for designers to be able to validate whether the Soft Error Rate (SER) targets of their design have been met, and help end users select the processor best suited to their reliability goals. The knowledge of the observable worst-case SER allows designers to select their design point, and bound the worst-case vulnerability at that design point. We highlight the lack of a methodology for evaluation of the overall observable worst-case SER. Hence, there is a clear need for a so called stress mark that can demonstrably approach the observable worst-case SER. The worst-case thus obtained can be used to identify reliability bottlenecks, validate safety margins used for reliability design and identify inadequacies in benchmark suites used to evaluate SER. Starting from a comprehensive study about how micro architecture-dependent program characteristics affect soft errors, we derive the insights needed to develop an automated and flexible methodology for generating a stress mark that approaches the maximum SER of an out-of-order processor. We demonstrate how our methodology enables architects to quantify the impact of SER-mitigation mechanisms on the worst-case SER of the processor. The stress mark achieves 1.4X higher SER in the core, 2.5X higher SER in DL1 and DTLB, and 1.5X higher SER in L2 as compared to the highest SER induced by SPEC CPU2006 and MiBench programs.
Date of Conference: 4-8 Dec. 2010