Skip to Main Content
This work introduces the Distributed Network Reachability (DNR) algorithm, a distributed system-level diagnosis algorithm that allows every node of a partitionable arbitrary topology network to determine which portions of the network are reachable and unreachable. DNR is the first distributed diagnosis algorithm that works in the presence of network partitions and healings caused by dynamic fault and repair events. Both crash and timing faults are assumed, and a faulty node is indistinguishable of a network partition. Every link is alternately tested by one of its adjacent nodes at subsequent testing intervals. Upon the detection of a new event, the new diagnostic information is disseminated to reachable nodes. New events can occur before the dissemination completes. Any time a new event is detected or informed, a working node may compute the network reachability using local diagnostic information. The bounded correctness of DNR is proved, including the bounded diagnostic latency, bounded startup and accuracy. Simulation results are presented for several random and regular topologies, showing the performance of the algorithm under highly dynamic fault situations.
Parallel and Distributed Systems, IEEE Transactions on (Volume:23 , Issue: 8 )
Date of Publication: Aug. 2012