Skip to Main Content
Reliability is of great practical importance in distributed computing systems (DCSs) due to its immediate impact on system performance, i.e., quality of service. The issue of reliability becomes more crucial particularly for `cost-conscious' DCSs like grids and clouds. Unreliability brings about additional-often excessive-capital and operating costs. Resource failures are considered as the main source of unreliability in this study. In this study, we investigate the reliability of workflow execution in the context of scheduling and its effect on operating costs in DCSs, and present the reliability for profit assurance (RPA) algorithm as a novel workflow scheduling heuristic. The proposed RPA algorithm incorporates a (operating) cost-aware replication scheme to increase reliability. The incorporation of cost awareness greatly contributes to efficient replication decisions in terms of profitability. To the best of our knowledge, the work in this paper is the first attempt to explicitly take into account (monetary) reliability cost in workflow scheduling.