Skip to Main Content
Computational Grids have the capability to provide the main execution platform for high performance distributed applications. Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media are extremely complex and prone to different kinds of errors, failures and faults. Grid is a layered architecture and most of the fault tolerant techniques developed on grids use its strict layering approach. In this paper, we have proposed a cross-layer design for handling faults proactively. In a cross-layer design, the top- down and bottom-up approach is not strictly followed, and a middle layer can communicate with the layer below or above it . At each grid layer there would be a monitoring component that would decide on predefined factors that the reliability of that particular layer is high, medium or low. Based on Hardware Reliability Rating (HRR) and Software Reliability Rating (SRR), the Middleware Monitoring Component / Cross- Layered Component (MMC/CLC) would generate a Combined Rating (CR) using CR calculation matrix rules. Each grid participating node will have a CR value generated through cross layered communication using HMC, MMC/CLC and SMC. All grid nodes will have their CR information in the form of a CR table and high rated machines would be selected for job execution on the basis of minimum CPU load along with different intensities of check pointing. Handling faults proactively at each layer of grid using cross communication model would result in overall improved dependability and increased performance with less overheads of check pointing.
Date of Conference: 12-13 Nov. 2007