By Topic

Overcoming Early-Life Failure and Aging for Robust Systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Li, Y ; Stanford Univ., Stanford, CA, USA ; Young Moon Kim ; Mintarno, E. ; Gardner, D.S.
more authors

The prospect of system failure has increased because of device and chip-level effects in the late CMOS era. In this article, the authors present novel system-level architecture and design innovations to cope with these lifetime reliability challenges. At nanometer-scale geometries, several hardware failure mechanisms, which were largely benign in the past, are becoming visible at the system level. Moreover, recent studies indicate that, depending on the application, hardware failures can be significant contributors to overall system failure rates.Design of robust systems ensuring required hardware reliability, although nontrivial, is achievable but at high costs. Concurrent error detection during system operation is an extremely important aspect of such systems.Hardware reliability challenges arise from three major sources: early-life failures (also called infant mortality), radiation-induced soft errors, and circuit aging. Several techniques, such as Built-in Soft-Error Resilience (BISER), can be effectively used for correcting radiation-induced transient (soft) errors. Focus on early-life failures (ELF) and circuit aging was discussed. These techniques utilize specific characteristics of reliability mechanisms without incurring the high costs of traditional concurrent error detection.

Published in:

Design & Test of Computers, IEEE  (Volume:26 ,  Issue: 6 )