Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources

Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources | IEEE Conference Publication | IEEE Xplore