Skip to Main Content
This paper proposes some methods to improve the fault-tolerance in distributed systems specifically in deterministic situations by distributed decision-making and coordination. Providing a distributed system with fault tolerance is a feasible but hard problem due to the intrinsic aspects of such systems such as: independency, unpredictability and communication problem. A multi-agent system as an instance of distributed system, can handle different kinds of fault by using traditional fault tolerance techniques. But what focused in this paper are agent-based techniques. In fact proposed methods are based on agent-based help provision by distributed cooperation among helper agents. The helpers try to tune their normal roles such that they can undertake the faulty agents' tasks too. These helpers go through a sub-optimal task selection algorithm, to decide whom to help. It is important to remark that there is no explicit interaction; instead they coordinate their decisions implicitly by adopting the most appropriate task in terms of their speed, relative reliability and the task's criticality coefficient. Proposed ideas are tested on a DCS-like tested to improve its fault tolerance. The results illustrate the effectiveness of the approaches in comparison to the case of no help situation and the case of using purely redundant components.