By Topic

Fault tolerance mechanism in chip many-core processors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
4 Author(s)
Zhang, Lei ; Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China; Graduate University of Chinese Academy of Sciences, Beijing 100080, China ; Han, Yinhe ; Li, Huawei ; Li, Xiaowe

As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+Mis proposed to improve N-coreprocessors' yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors' performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.80/0 degradation with negligible computing time.

Published in:

Tsinghua Science and Technology  (Volume:12 ,  Issue: S1 )