Loading [MathJax]/extensions/MathMenu.js
Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions | IEEE Journals & Magazine | IEEE Xplore

Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions


Abstract:

The shrinking processor feature size, lower threshold voltage, and increasing clock frequency make modern processors highly vulnerable to transient faults. Architectural ...Show More

Abstract:

The shrinking processor feature size, lower threshold voltage, and increasing clock frequency make modern processors highly vulnerable to transient faults. Architectural Vulnerability Factor (AVF) reflects the possibility that a transient fault eventually causes a visible error in the program output, and it indicates a system's susceptibility to transient faults. Therefore, the awareness of the AVF, especially at early design stage, is greatly helpful to achieve a trade-off between system performance and reliability. However, tracking the AVF during program execution is extremely costly, which makes accurate AVF prediction extraordinarily attractive to computer architects. In this paper, we propose to use Boosted Regression Trees (BRT), a nonparametric tree-based predictive modeling scheme, to identify the correlation across workloads, execution phases, and processor configurations between a key processor structure's AVF and various performance metrics. The proposed method not only makes an accurate prediction but also quantitatively illustrates individual performance variable's importance to the AVF. A quantitative comparison between our model and conventional linear regression is performed in terms of model stability, showing that our model is more stable when the model size varies. Moreover, to reduce the prediction complexity, we also utilize a technique named Patient Rule Induction Method (PRIM) to extract some simple selecting rules on important metrics. Applying these rules during runtime can fast identify execution intervals with a relatively high AVF. A case study that enables PRIM-based ROB redundancy has been performed to demonstrate a possible application of the trained PRIM rules.
Published in: IEEE Transactions on Computers ( Volume: 59, Issue: 5, May 2010)
Page(s): 593 - 607
Date of Publication: 22 September 2009

ISSN Information:


1 Introduction

The electronic noise, which usually comes from large power supplies, strong radiations, or high-energy particle strikes [32], may invert the state of a logic device when the resulted charge has been accumulated to a sufficient amount. The introduced logic fault is termed as a soft error or a transient fault [19]. The shrinking trend in processor feature size, particularly the exponential growth rate of on-chip transistors, along with lower supply voltage and increasing clock frequency make modern processors extremely vulnerable to transient faults. Fortunately, not all such faults eventually affect the final program outcome. For example, a bit flip in an empty Reorder Buffer entry will not cause any effect in the program execution. Based on this observation, Li et al. [17] defined a structure's Architectural Vulnerability Factor (AVF) as the probability that a transient fault in the structure finally produces a visible error in the output of a program. At any point of time, a structure's AVF can be derived via counting all the important bits that are required for Architecturally Correct Execution (ACE) in the structure, and dividing them by the total number of bits of the structure. Using the ACE analysis method, many publications (e.g., [19], [12], [13]) have reported a large masking effect of transient faults at the architectural level, that is, a key processor structure usually shows an AVF below 40 percent, but with a large variation over time.

Contact IEEE to Subscribe

References

References is not available for this document.