Skip to Main Content
Volunteer-based grid computing resources are characteristically volatile and frequently become unavailable due to the autonomy that owners maintain over them. This resource volatility has significant influence on the applications the resources host. Availability predictors can forecast unavailability, and can provide schedulers with information about reliability, which helps them make better scheduling decisions when combined with information about speed and load. This paper studies using this prediction information for deciding when to replicate jobs. In particular, our predictors forecast the probability that a job will complete uninterrupted, and our schedulers replicate those jobs that are least likely to do so. Our strategies outperform other comparable replication strategies, as measured by improved make span and fewer redundant operations. We define a new ``replication efficiency" metric, and demonstrate that our availability predictor can provide information that allows our schedulers to be more efficient than the most closely related replication strategy for a variety of loads in a trace-based grid simulation. We demonstrate that under low load conditions, our techniques come within 6% of the makespan improvement of a previously proposed replication technique while creating 76.8% fewer replicas and under higher loads, can improve makespan marginally while creating 72.5% fewer replicas.