By Topic

Transparent adaptive library-based checkpointing for master-worker style parallelism

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
G. Cooperman ; Coll. of Comput. & Inf. Sci., Northeastern Univ., Boston, MA, USA ; J. Ansel ; Xiaoqin Ma

We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restart, to the number of processor nodes available. This is important, since nodes in a cluster fail. It also allows one to adapt to using multiple cluster partitions and multiple resources from the computational grid, as they become available. Checkpointing a master-worker computation has the additional advantage of needing to checkpoint only the master process. This is both fast and more economical of disk space. This has been demonstrated by checkpointing Geant4, a million line C++ program. Our solution has been implemented in the context of TOP-C (task oriented parallel C/C++), a free, open-source parallel package, although it can easily be ported to additional master-worker packages.

Published in:

Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on  (Volume:1 )

Date of Conference:

16-19 May 2006