This paper presents an approach based on using consensus to solving conflict situations in fault-tolerant distributed systems. It is assumed that the processors of a distributed system may have different pictures of the failure situation because the faulty processors may give a wrong answer for a common message or even may impersonate other processors, thus if the number of good and bad processors are not estimated then the proper version of the failure situation is not known. We propose to determine the consensus of versions possessed by the processors and treat it as the most reliable version of the situation, which should be used for further analysis. The paper presents a consensus problem for solving this kind of problem, the postulates for consensus choice, their analysis and some algorithms for determining consensus when the structure of versions is known. Using consensus methods is a new approach to solving this kind of problem, and it is useful in the cases when the upper bound of the number of fault processors is not known
Published in:
Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on
Date of Conference: 2001