By Topic

Fault-tolerant message switching based on wormhole switching and backtracking

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Sueishi, M. ; Graduate Sch. of Sci. & Technol., Chiba Univ., Japan ; Kitakami, M. ; Ito, H.

Parallel computers are now popularly applied to applications where many calculations are required. In a NO Remote memory Access model (NORA) parallel computer, many processors are connected by communication links and calculation results are obtained by communications among processors. The message switching method, which controls message transmission in the parallel computer, is one of the most important parameters to improve the performance of the parallel computer. Since parallel computers include many processors, its failure rate is very high and many fault-tolerant switching methods have been proposed. The existing methods have problems, however, such as low communication throughput, low fault-tolerant capability, and large hardware overhead. We propose fault-tolerant switching by improving wormhole switching. The proposed method inserts dummy flits, having no information, after the header flit, the first flit of the packet. By overwriting the header flit to the dummy flit, backtracking is implemented without hardware overhead. Computer simulation says that in a 16 by 16 2D torus, for example, the throughput of the proposed method is almost equal to that of existing methods which require large hardware overhead if the number of the faulty nodes is less then 40.

Published in:

Dependable Computing, 2004. Proceedings. 10th IEEE Pacific Rim International Symposium on

Date of Conference:

3-5 March 2004