Skip to Main Content
The execution of long-running batch programs imposes severe reliability constraints on a computing system since the occurrence of a failure during its execution is more likely and that once occurred, a failure would destroy all the processing perfonned thus far. This paper studies the execution delay and machine resources consumed in supporting the running of large batch programs in a computing environment interrupted by failures. The effect of checkpoints and their optimal insertion are also considered. The results are applicable to arbitrary law of failure.