Concurrent robust checkpointing and recovery in distributed systems | IEEE Conference Publication | IEEE Xplore