Portable checkpointing and recovery
- Already Purchased? View Article
- Subscription Options Learn More
This paper presents a checkpointing scheme that was implemented in a parallel library that runs on top of CHIMP/MPI. The main goals of the checkpointing mechanism are portability and efficiency. It runs on every platform supported by MPI in a machine-independent way. The scheme allows the migration of checkpoints and offers a flexible recovery mechanism based on data-reconfiguration. Some performance results will be presented at the end of the paper together with some techniques that can be used to increase the efficiency of the checkpointing mechanism
Published in:
High Performance Distributed Computing, 1995., Proceedings of the Fourth IEEE International Symposium on
Date of Conference: 2-4 Aug 1995