Skip to Main Content
The shrinking processor feature size, lower threshold voltage, and increasing clock frequency make modern processors highly vulnerable to transient faults. Architectural Vulnerability Factor (AVF) reflects the possibility that a transient fault eventually causes a visible error in the program output, and it indicates a system's susceptibility to transient faults. Therefore, the awareness of the AVF, especially at early design stage, is greatly helpful to achieve a trade-off between system performance and reliability. However, tracking the AVF during program execution is extremely costly, which makes accurate AVF prediction extraordinarily attractive to computer architects. In this paper, we propose to use Boosted Regression Trees (BRT), a nonparametric tree-based predictive modeling scheme, to identify the correlation across workloads, execution phases, and processor configurations between a key processor structure's AVF and various performance metrics. The proposed method not only makes an accurate prediction but also quantitatively illustrates individual performance variable's importance to the AVF. A quantitative comparison between our model and conventional linear regression is performed in terms of model stability, showing that our model is more stable when the model size varies. Moreover, to reduce the prediction complexity, we also utilize a technique named Patient Rule Induction Method (PRIM) to extract some simple selecting rules on important metrics. Applying these rules during runtime can fast identify execution intervals with a relatively high AVF. A case study that enables PRIM-based ROB redundancy has been performed to demonstrate a possible application of the trained PRIM rules.