By Topic

Application-Level Fault-Tolerance Solutions for Grid Computing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Diaz, D. ; Comput. Archit. Group, A Coruna Univ., A Coruna ; Pardo, X.C. ; Martin, M.J. ; Gonzalez, P.

One of the key functionalities provided by Grid systems is the remote execution of applications. This paper introduces a research proposal on fault-tolerance mechanisms for the execution of sequential and message-passing parallel applications on the Grid. A service-based architecture called CPPC-G is proposed. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpointing instrumentation into the application code. CPPC-G services will be in charge of the submission and monitoring of the application execution, management of checkpoint files generated by CPPC-enabled applications, and detection and automatic restart of failed executions. The development of the CPPC-G architecture will involve research in different areas such as storage and management of data files (checkpointfiles); automatic selection of suitable computing resources; reliable detection of execution failures and robustness issues to make the architecture fault-tolerant itself.

Published in:

Cluster Computing and the Grid, 2008. CCGRID '08. 8th IEEE International Symposium on

Date of Conference:

19-22 May 2008