Demystifying Soft Error Assessment Strategies on ARM CPUs: Microarchitectural Fault Injection vs. Neutron Beam Experiments | IEEE Conference Publication | IEEE Xplore

Demystifying Soft Error Assessment Strategies on ARM CPUs: Microarchitectural Fault Injection vs. Neutron Beam Experiments


Abstract:

Fault injection in early microarchitecture-level simulation CPU models and beam experiments on the final physical CPU chip are two established methodologies to access the...Show More

Abstract:

Fault injection in early microarchitecture-level simulation CPU models and beam experiments on the final physical CPU chip are two established methodologies to access the soft error reliability of a microprocessor at different stages of its design flow. Beam experiments, on one hand, estimate the devices expected soft error rate in realistic physical conditions by exposing it to accelerated particles fluxes. Fault injection in microarchitectural models of the processor, on the other hand, provides deep insights on faults propagation through the entire system stack, including the operating system. Combining beam experiments and fault injection data can deliver deep insights about the devices expected reliability when deployed in the field. However, it is yet largely unclear if the fault injection error rates can be compared to those reported by beam experiments and how this comparison can lead to informed soft error protection decisions in early stages of the system design. In this paper, we present and analyze data gathered with extensive beam experiments (on physical CPU hardware) and microarchitectural fault injections (on an equivalent CPU model on Gem5) performed with 13 different benchmarks executed on top of Linux on an ARM Cortex-A9 microprocessor. We combine experimental data that cover more than 2.9 million years of natural exposure with the result of more than 80,000 injections. We then compare the soft error rate estimations that are based on neutron beam and fault injection experiments. We show that, for most benchmarks, fault injection can be very accurately used to predict the Silent Data Corruptions (SDCs) rate and the Application Crash rate. The System Crash rate measured with beam experiments, however is much larger than the one estimated by fault injection due to unknown proprietary parts of the physical hardware platform that can't be modeled in the simulator. Overall, our analysis shows that the relative difference between the total error rates o...
Date of Conference: 24-27 June 2019
Date Added to IEEE Xplore: 22 August 2019
ISBN Information:
Print on Demand(PoD) ISSN: 1530-0889
Conference Location: Portland, OR, USA

Contact IEEE to Subscribe

References

References is not available for this document.