By Topic

Process migration for MPI applications based on coordinated checkpoint

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Jiannong Cao ; Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong, China ; Yinghao Li ; Minyi Guo

A lot of research has been done on fault-tolerance for MPI applications, some on checkpoint/restart, and some on network fault-tolerance. Process migration, however, has not gained widespread use due to the additional complexity of the requirement that the knowledge about the new location of a migrated process has to be made known to every other process in the application. Here we present a simple yet effective method of process migration based on coordinated checkpointing of MPI applications. Migration is achieved by checkpointing the application, modifying the process location information in the checkpoint files, and restarting the application. Checkpoint/restart and migration are transparent to MPI applications. Performance evaluation results showed that the additional checkpoint/restart capability has little impact on application performance, and the migration method scales well on a large number of nodes.

Published in:

11th International Conference on Parallel and Distributed Systems (ICPADS'05)  (Volume:1 )

Date of Conference:

20-22 July 2005