Grid computing involves heterogeneous, geographically distributed resources that can work on a task together. Since the resource availability is dynamic, the grid infrastructure is prone to failure. So inorder to adapt to the failure, fault tolerant mechanism must be implemented. Commonly used techniques for fault tolerance are checkpointing and load balancing. To have an efficient fault tolerance mechanism this paper comes up with an optimal checkpointing algorithm that reduces overhead caused due to checkpointing. The proposed system uses Job replication to ensure completion of work and Dynamic Load balancing is used to avoid overload in any resources and to achieve maximum resource utilization and maximize throughput.
Published in:
Computer Communication and Informatics (ICCCI), 2012 International Conference on
Date of Conference: 10-12 Jan. 2012