Scheduled System Maintenance:
On May 6th, system maintenance will take place from 8:00 AM - 12:00 PM ET (12:00 - 16:00 UTC). During this time, there may be intermittent impact on performance. We apologize for the inconvenience.
By Topic

Theft-induced checkpointing for reconfigurable dataflow applications

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
4 Author(s)
Jafar, S. ; Lab. ID-IMAG, Montbonnot Saint-Martin ; Krings, A.W. ; Gautier, T. ; Roch, J.-L.

In this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small

Published in:

Electro Information Technology, 2005 IEEE International Conference on

Date of Conference:

22-25 May 2005