Fault-tolerance, malleability and migration for divide-and-conquer applications on the grid | IEEE Conference Publication | IEEE Xplore

Fault-tolerance, malleability and migration for divide-and-conquer applications on the grid


Abstract:

Grid applications have to cope with dynamically changing computing resources as machines may crash or be claimed by other, higher-priority applications. In this paper, we...Show More

First Page of the Article

Abstract:

Grid applications have to cope with dynamically changing computing resources as machines may crash or be claimed by other, higher-priority applications. In this paper, we propose a mechanism that enables fault-tolerance, malleability (e.g. the ability to cope with a dynamically changing number of processors) and migration for divide-and-conquer applications on the grid. The novelty of our approach is restructuring the computation tree, which eliminates redundant computation and salvages partial results computed by the processors leaving the computation. This enables the applications to adapt to dynamically changing numbers of processors and to migrate the computation without loss of work. Our mechanism is easy to implement and deploy in grid environment. The overhead it incurs is close to zero. We have implemented our mechanism in the Satin system. We have evaluated the performance of our system on the DAS-2 wide-area system and on the testbed of the European GridLab project.
Date of Conference: 04-08 April 2005
Date Added to IEEE Xplore: 18 April 2005
Print ISBN:0-7695-2312-9
Print ISSN: 1530-2075
Conference Location: Denver, CO, USA

First Page of the Article


References

References is not available for this document.