We propose a highly available replication control protocol tailored to environments where network partitions are always the result of a gateway failure. Our protocol divides nodes holding replicas into local nodes that can communicate directly with each other and non-local nodes that communicate with other nodes through one or more gateways. While local nodes are assumed to remain up to date as long as they don't crash, non-local nodes are required to maintain a volatile witness on the same network segment as the local nodes and must poll this witness before answering any user request. To speed up recovery from a total failure, each site maintains a list of replicas that were available the last time the data were updated or a replica recovered from a crash. Markov models are used to compare the performance of our protocol with that of the dynamic-linear voting protocol (DLV), which is the best replication control protocol tolerating communication failures. We also observe that volatile witness placement has a strong impact on data availability and gateway nodes are the best location for them
Published in:
Distributed Computing Systems, 1994., Proceedings of the 14th International Conference on
Date of Conference: 21-24 Jun 1994