A Checkpointing Strategy for Scalable Recovery on Distributed Parallel Systems | IEEE Conference Publication | IEEE Xplore