Skip to Main Content
This paper solves the problem of maximizing reliability of heterogeneous distributed computing system where random node can fail permanently. The reliability of the system can be achieved by executing all the tasks queued on its node before they all fail. This paper presents a framework to characterize the service reliability of Distributed Computing System (DCS). Reliability is characterized in the presence of communication uncertainties and topological changes due to nodes deletion. Because the DCS is heterogeneous, so its various nodes have different hardware and software characteristics. The different components of the application also have various hardware and software requirements. These applications will provide their desired functionality when their requirements are satisfied. For improving the reliability of the DCS one way is the proper allocation of tasks among the nodes. Firstly, we determine the candidate nodes for tasks that can satisfy to its requirements. Then we utilize the load sharing policies for handling the nodes failure as well as maximizing the service reliability of DCS.