Skip to Main Content
In the Grid and cloud computing systems, resources are shared among a large number of applications and users. These systems, as efficient environments, can be used for execution of long-running distributed applications. The failure occurrence during critical long-running applications can lead to spend considerable time and cost. This paper proposes an efficient job and resource monitoring services to attain the needed degree of availability and reliability of long-life application. In addition to the quality of services, the other focus of this work is to minimize resource consumption and the cost of requested services in the economic grid. The dynamic nature of proposed monitoring service leads to improve the availability and reliability of grid resources/services with low resource consumption. Analytical approach (Markov approach) is used to analyze the effect of our service on the availability and reliability of grid services/resources in the presence of permanent and transient faults.