By Topic

Incorporating fault tolerance in GA-based scheduling in grid environment

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Neeraj Upadhyay ; Electronics & Computer Engineering Department, Indian Institute of Technology Roorkee, Uttarakhand, India ; Manoj Misra

Grid systems differ from traditional distributed systems in terms of their large scale, heterogeneity and dynamism. These factors contribute towards higher frequency of fault occurrences; large scale causes lower values of Mean Time To Failure (MTTF), heterogeneity results in interaction faults (protocol mismatches) between communicating dissimilar nodes and dynamism with dynamically varying resource availability due to resources autonomously entering and leaving the grid effects execution of jobs. Another factor that increases probability of failure of applications is that applications running on grid are long running computations taking days to finish. Incorporating fault tolerance in scheduling algorithms is one of the approaches for handling faults in grid environment. Genetic Algorithms are a popular class of meta-heuristic algorithms used for grid scheduling. These are stochastic search algorithms based on the natural process of fitness based selection and reproduction. This paper combines GA-based scheduling with fault tolerance techniques such as checkpointing (dynamic) by modifying the fitness function. Also certain scenarios such as checkpointing without migration for resources with different downtimes and autonomous nature of grid resource providers are considered in building fitness functions. The motivation behind the work is that scheduling-assisted fault tolerance would help in finding the appropriate schedule for the jobs which would complete in the minimum time possible even when resources are prone to failures and thus help in meeting job deadlines. Simulation results for the proposed techniques are presented with respect to makespan and flowtime and fitness value of the resultant schedule obtained. The results show improvement in makespan and flowtime of the adaptive checkpointing approaches over static checkpointing approach. Also the approach which takes into consideration the last failure times of resources perform better than the approach bas- d only on the mean failure times of resources.

Published in:

Information and Communication Technologies (WICT), 2011 World Congress on

Date of Conference:

11-14 Dec. 2011