Skip to Main Content
Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure recovery making use of parallel recomputing. However, it complicates the coding of the application program. This paper uses compiler technology to automate the design of FTPA, and introduces the implementation of a tool called GiFT (Get it Fault-Tolerant). GiFT utilizes the extended data-flow analysis to choose the state needed by failure recovery, exploits the parallel recomputing time model to compute the optimal number of recomputing processes, and uses parallelization technologies to generate parallel recomputing codes. The experimental results show that original MPI programs can be transformed into the FTPA counterparts by GiFT correctly, and the performance of GiFT-generated FTPA programs is comparable to the performance of hand-modified FTPA programs.