Skip to Main Content
Grid systems differ from traditional distributed systems in terms of their large scale, heterogeneity and dynamism. These factors contribute towards higher frequency of fault occurrences; large scale causes lower values of Mean Time To Failure (MTTF), heterogeneity results in interaction faults (protocol mismatches) between communicating dissimilar nodes and dynamism with dynamically varying resource availability due to resources autonomously entering and leaving the grid effects execution of jobs. Another factor that increases probability of failure of applications is that applications running on grid are long running computations taking days to finish. Incorporating fault tolerance in scheduling algorithms is one of the approaches for handling faults in grid environment. Genetic Algorithms are a popular class of meta-heuristic algorithms used for grid scheduling. These are stochastic search algorithms based on the natural process of fitness based selection and reproduction. This paper combines GA-based scheduling with fault tolerance techniques such as checkpointing (dynamic) by modifying the fitness function. Also certain scenarios such as checkpointing without migration for resources with different downtimes and autonomous nature of grid resource providers are considered in building fitness functions. The motivation behind the work is that scheduling-assisted fault tolerance would help in finding the appropriate schedule for the jobs which would complete in the minimum time possible even when resources are prone to failures and thus help in meeting job deadlines. Simulation results for the proposed techniques are presented with respect to makespan and flowtime and fitness value of the resultant schedule obtained. The results show improvement in makespan and flowtime of the adaptive checkpointing approaches over static checkpointing approach. Also the approach which takes into consideration the last failure times of resources perform better than the approach bas- d only on the mean failure times of resources.