Skip to Main Content
Overlay networks have emerged as a powerful and flexible platform for developing new disruptive network applications. The attractive characteristics of overlay networks such as planetary-scale distributions, user-level flexibility (e.g., overlay routing) and manageability bring to overlay fault diagnosis new challenges, which include inaccessible underlying network information, incomplete and inaccurate network status observations; dynamic symptom-fault causality relationships, and multi-layer complexity. To address these challenges, we propose a distributed user-level Belief Revision based overlay fault diagnosis technique called EUDiag. EUDiag can passively use observed overlay symptoms as reported by overlay monitoring agents to correlate and diagnose faults, and select the least-costly appropriate probing actions whenever necessary to enhance the passive fault reasoning results. EUDiag adapts to the changes in highly dynamic overlay networks by incrementally revising user beliefs based on new observed overlay symptoms. EUDiag can diagnose faults without relying on underlying network fault probabilistic quantifications (e.g. prior fault probability).Simulations and experimental studies show that EUDiag can efficiently (e.g. low latency) and accurately localize root causes of overlay faults/problems, even when the observed symptoms are incomplete.