Skip to Main Content
Availability of grid resources is dynamic, which raises the need to develop robust and effective applications against changing circumstances. A major challenge in a distributed dynamic grid is fault tolerance. The more resources and components involved the more complicated and error-prone the system becomes. In a grid with potentially thousands of machines connected to each other, the reliability of individual resources cannot be guaranteed. Hence, the probability of failure is higher in grid computing and the failure of resources affects task execution fatally. Therefore, a fault tolerance model is essential in grid. Also grid services are often expected to meet some minimum levels of Quality of Service (QoS) for a desirable operation. Common fault tolerance techniques in computational grid are normally achieved with checkpoint-recovery and task replication on alternative resources in case of a system outrage. However, the load balancing with fault tolerance strategies applied for a grid suffer from several deficiencies: some load balancing with fault tolerance models use checkpoint-recovery techniques to tolerate failures which leads to increase in average wait time thereby increasing the mean response time, while other models depend on task replication to tolerate failures which reduces grid efficiency in terms of optimal resource utilization under varying load. To address these deficiencies, an efficient fault tolerant load balancing model named as “Optimal Neighbour” (OP) model has been proposed. The fault tolerant load balancing model is dynamic, decentralized and symmetric initiated. The simulation results show that the “Optimal Neighbour” (OP) fault tolerant load balancing model yields better results when compared with the novel fault tolerant load balancing model.