An optimal checkpoint/restart model for a large scale high performance computing system | IEEE Conference Publication | IEEE Xplore