By Topic

A gracefully degrading massively parallel system using the BSP model, and its evaluation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Savva, A. ; Fujitsu Labs. Ltd., Kawasaki, Japan ; Nanya, T.

The Bulk-Synchronous Parallel (BSP) Model was proposed as a unifying model for parallel computation. By using Randomized Shared Memory (RSM), the model offers an asymptotically optimal emulation of the Parallel Random Access Machine (PRAM). By using the BSP model with RSM, we construct a gracefully degrading massively parallel system using a fault tolerance (FT) scheme that relies on memory duplication to ensure global memory integrity and to speed up the reconfiguration. After a fault occurs, global reconfiguration restores the logical properties of the system. Work done during reconfiguration is shared equally among the live processors, with minimal coordination. We analyze, at the level of the BSP model, how the performance of a system may change as processors fail and the performance of the interconnection network degrades. We relate the change in overall system performance to the change in computation and communication load on the live processors. Further, we show how to estimate the overhead imposed by the FT scheme. We evaluate the reconfiguration time, the overhead, and graceful degradation of the system experimentally by an implementation on a Massively Parallel Processor (MPP). We show that the predictions about the degradation of the system and the overhead cost of the scheme are accurate

Published in:

Computers, IEEE Transactions on  (Volume:48 ,  Issue: 1 )