Close category search window
 

Chip Self-Organization and Fault Tolerance in Massively Defective Multicore Arrays

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Collet, J.H. ; Centre ''Nat. de la Rech. Sci., LAAS CNRS, Toulouse, France ; Zajac, P. ; Psarakis, M. ; Gizopoulos, D.

We study chip self-organization and fault tolerance at the architectural level to improve dependable continuous operation of multicore arrays in massively defective nanotechnologies. Architectural self-organization results from the conjunction of self-diagnosis and self-disconnection mechanisms (to identify and isolate most permanently faulty or inaccessible cores and routers), plus self-discovery of routes to maintain the communication in the array. In the methodology presented in this work, chip self-diagnosis is performed in three steps, following an ascending order of complexity: interconnects are tested first, then routers through mutual test, and cores in the last step. The mutual testing of routers is especially important as faulty routers are disconnected by good ones with no assumption on the behavior of defective elements. Moreover, the disconnection of faulty routers is not physical (“hard”) but logical (“soft”) in that a good router simply stops communicating with any adjacent router diagnosed as defective. There is no physical reconfiguration in the chip and no need for spare elements. Ultimately, the multicore array may be viewed as a black box, which incorporates protection mechanisms and self-organizes, while the external control reduces to a simple chip validation test which, in the simplest cases, reduces to counting the number of valid and accessible cores.

Published in:
Dependable and Secure Computing, IEEE Transactions on  (Volume:8 ,  Issue: 2 )

Date of Publication: March-April 2011

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.