Abstract:
Resilience to errors in the underlying hardware is a key design objective for a large class of computing systems, from embedded systems all the way to the cloud. Sources ...Show MoreMetadata
Abstract:
Resilience to errors in the underlying hardware is a key design objective for a large class of computing systems, from embedded systems all the way to the cloud. Sources of hardware errors include radiation, circuit aging, variability induced by manufacturing and operating conditions, manufacturing test escapes, and early-life failures. Many publications have suggested that cross-layer resilience, where multiple error resilience techniques from different layers of the system stack cooperate to achieve cost-effective resilience, is essential for designing cost-effective resilient digital systems. This paper presents a comprehensive overview of crosslayer resilience by addressing fundamental cross-layer resilience questions, by summarizing insights derived from recent advances in cross-layer resilience research, and by discussing future crosslayer resilience challenges. CCS CONCEPTS : · General and reference → Reliability; · Hardware → Fault tolerance; · Computer systems organization → Reliability.
Published in: 2019 56th ACM/IEEE Design Automation Conference (DAC)
Date of Conference: 02-06 June 2019
Date Added to IEEE Xplore: 22 August 2019
ISBN Information:
Print on Demand(PoD) ISSN: 0738-100X
Conference Location: Las Vegas, NV, USA