By Topic

GiFT: Automating FTPA Implementation for MPI Programs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Hongyi Fu ; Sch. of Comput., Nat. Univ. of Defense Technol., Changsha ; Yunfei Du ; Panfeng Wang ; Jia Jia
more authors

Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure recovery making use of parallel recomputing. However, it complicates the coding of the application program. This paper uses compiler technology to automate the design of FTPA, and introduces the implementation of a tool called GiFT (Get it Fault-Tolerant). GiFT utilizes the extended data-flow analysis to choose the state needed by failure recovery, exploits the parallel recomputing time model to compute the optimal number of recomputing processes, and uses parallelization technologies to generate parallel recomputing codes. The experimental results show that original MPI programs can be transformed into the FTPA counterparts by GiFT correctly, and the performance of GiFT-generated FTPA programs is comparable to the performance of hand-modified FTPA programs.

Published in:

Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on

Date of Conference:

8-10 Dec. 2008