The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI | IEEE Conference Publication | IEEE Xplore