By Topic

A concurrent test architecture for massively parallel computers and its error detection capability

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
M. V. A. Hancu ; Centre de Recherche Inf. de Montreal, Que., Canada ; K. Iwasaki ; Y. Sato ; M. Sugie

Presents new principles for online monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detection process. In the proposed test architecture, concurrent testing (or online monitoring) at the system level is accomplished by enforcing the run-time testing of the data and control dependences of the algorithm currently being executed on the parallel computer. In order to help in this process, each message contains both source and destination addresses. At each message source, the sequence of destination addresses of the outgoing messages is compressed on a block basis. At the same time, at each destination, the sequence of source addresses of all incoming messages is compressed, also on a block basis. Concurrent compression of the instructions executed by the PEs is also possible. As a result of this procedure, an image of the data dependences and of the control flow of the currently running algorithm is created. This image is compared, at the end of each computational block, with a reference image created at compilation time. The main results of this work are in proposing new principles for the online system-level testing of multiprocessor systems, based on signaturing and monitoring the data dependences together with the control dependences, and in providing an analytical model and analysis for the address compression process used for monitoring the data routing process

Published in:

IEEE Transactions on Parallel and Distributed Systems  (Volume:5 ,  Issue: 11 )