By Topic

Intelligent Selection of Fault Tolerance Techniques on the Grid

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Vanderster, D.C. ; Univ. of Victoria, Victoria ; Dimopoulos, N.J. ; Sobie, R.J.

The emergence of computational grids has lead to an increased reliance on task schedulers that can guarantee the completion of tasks that are executed on unreliable systems. There are three common techniques for providing task-level fault tolerance on a grid: retrying, replicating, and checkpointing. While these techniques are varyingly successful at providing resilience to faults, each of them presents a tradeoff between performance and resource cost. As such, tasks having unique urgency requirements would ideally be placed using one of the techniques; for example, urgent tasks are likely to prefer the replication technique, which guarantees timely completion, whereas low priority tasks should not incur any extra resource cost in the name of fault tolerance. This paper introduces a placement and selection strategy which, by computing the utility of each fault tolerance technique in relation to a given task, finds the set of allocation options which optimizes the global utility. Heuristics which take into account the value offered by a user, the estimated resource cost, and the estimated response time of an option are presented. Simulation results show that the resulting allocations have improved fault tolerance, runtime, profit, and allow users to prioritize their tasks.

Published in:

e-Science and Grid Computing, IEEE International Conference on

Date of Conference:

10-13 Dec. 2007