Skip to Main Content
We present a checkpointing mechanism for a DSM system that, in spite of being invisible to the programmer, is quite efficient and portable. It is efficient because it is nonblocking, coordinated and thus domino-effect free. It offers some portability because it is built on top of MPI and uses only the services offered by MPI and a POSIX compliant local file system. As far as we know, this is the first real implementation of such a scheme for DSM. Along with the description of the algorithms used, we present experimental results obtained in a cluster of workstations, and discuss many insights that came out of the implementation effort. We hope that our research shows that efficient, transparent and portable checkpointing is viable for DSM systems.