Skip to Main Content
Service level agreements (SLAs) have been introduced into the grid in order to build a basis for its commercial uptake. The challenge for Grid providers in agreeing and operating SLA-bound jobs is to ensure their fulfillment even in the case of failures. Hence, fault-tolerance mechanisms are an essential means of the provider's SLA management. The high utilization of commercial operated clusters leads to scenarios in which typically a job migration effects other jobs scheduled. The effects result from the unavailability of enough free resources which would be needed to catch all resource outages. Consequently before initiating a migration, its effects for other jobs have to be compared and the initiation of fault- tolerance (FT-) mechanisms have to be evaluated recursively. This paper presents a measurement for the benefit of initiating a FT-mechanism, the recursive evaluation, and termination condition. Performing such an impact evaluation of an initiated chain of FT-mechanisms is often more profitable than performing a single FT-mechanism and accordingly this is important for the Grid commercialization.