By Topic

Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)

Fault localization is the core element in fault management. Symptom-fault map is commonly used to describe the symptom-fault causality in fault reasoning. For Internet service networks, a well-designed monitoring system can effectively correlate the observable symptoms (i.e., alarms) with the critical network faults (e.g., link failure). However, the lost and spurious symptoms can significantly degrade the performance and accuracy of a passive fault localization system. For overlay networks, due to limited underlying network accessibility, as well as the overlay scalability and dynamics, it is impractical to build a static overlay symptom-fault map. In this paper, we firstly propose a novel active integrated fault reasoning (AIR) framework to incrementally incorporate active investigation actions into the passive fault reasoning process based on an extended symptom-fault-action (SFA) model. Secondly, we propose an overlay network profile (ONP) to facilitate the dynamic creation of an overlay symptom-fault-action (called O-SFA) model, such that the AIR framework can be applied seamlessly to overlay networks (called O-AIR). As a result, the corresponding fault reasoning and action selection algorithms are elaborated. Extensive simulations and Internet experiments show that AIR and O-AIR can significantly improve both accuracy and performance in the fault reasoning for Internet and overlay service networks, especially when the ratio of the lost and spurious symptoms is high.

Published in:

IEEE Transactions on Network and Service Management  (Volume:5 ,  Issue: 1 )