By Topic

Dynamic Transfer of Computation to Processor Cache for Yield and Reliability Improvement

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Somnath Paul ; Department of EECS, Case Western Reserve University, Cleveland, Ohio, USA ; Swarup Bhunia

VLSI systems in the nanometer regime suffer from high defect rates and large parametric variations that lead to yield loss as well as reduced reliability of operation. An architectural framework that ensures proper system operation when few functional units are defective or unreliable under process-induced or temporal parametric variations can be effective in improving manufacturing yield and overall system reliability. In this paper, we propose a novel memory-based computational framework that exploits the on-chip memory to perform computation on demand using a lookup table (LUT)-based approach. The framework achieves reliable operation by transferring activity to embedded memory of a processor from a defective or unreliable functional unit. This allows the die to run at a reduced (but acceptable) performance level instead of being completely discarded due to unit failure (in case of defective functional unit) or being throttled (in case of temporal parameter variations, e.g., temperature induced variations). We note that although the worst-case latency of memory based computation can be considerably higher than regular operation latency, the average latency is only modestly higher due to the abundance of narrow-width operands. Furthermore, the operands for a specific instruction (e.g., integer add, multiply, or floating point add) experience high locality of reference and thus require loading only part of the LUTs in the cache. Simulation results for a set of benchmark applications show that the proposed scheme can significantly improve yield and reliability at the cost of only a small loss in performance (on an average 0.8%) and 10 × less area overhead compared to hardware duplication based defect tolerance approach.

Published in:

IEEE Transactions on Very Large Scale Integration (VLSI) Systems  (Volume:19 ,  Issue: 8 )