Abstract:
Soft errors are a major concern in current and future computing systems. They degrade the system reliability significantly. Existing solutions to soft errors either incur...Show MoreMetadata
Abstract:
Soft errors are a major concern in current and future computing systems. They degrade the system reliability significantly. Existing solutions to soft errors either incur unaffordable area and power overheads or fail to provide resilience against multiple-cell-upsets. We present the first low-cost, compiler-based approach Aster to handle multi-bit soft errors. Aster relies on acoustic wave detectors to detect soft errors, and uses idempotent processing and checkpointing to provide a complete solution to multi-bit soft errors. Aster comes into action when a soft error is detected. Based upon the information about the error location, Aster determines the impact of the error on the program correctness. The program continues to execute in case of no impact. Otherwise, it finds an optimal idempotent region which can recover the program from the error completely, and executes the program from the optimal idempotent region. Experimental results show that the time overhead is within 5 percent, approximately 60 percent reduction compared to the state-of-the-art schemes requiring expensive hardware support.
Published in: IEEE Transactions on Emerging Topics in Computing ( Volume: 8, Issue: 4, 01 Oct.-Dec. 2020)