Algorithm-based fault tolerance for floating-point operations in massively parallel systems

Algorithm-based fault tolerance for floating-point operations in massively parallel systems | IEEE Conference Publication | IEEE Xplore