This paper proposes a framework for distributed management of network faults by software agents. Intelligent network agents with advanced reasoning capabilities address many of the issues for the distribution of processing and control in network management. The agents detect, correlate and selectively seek to derive a clear explanation of alarms generated in their domain. The causal relationship between faults and their effects is presented as a Bayesian network. As evidence (alarms) is gathered, the probability of the presence of any particular fault is strengthened or weakened. Agents having a narrower view of the network forward their findings to another with a much broader view of the network. Depending on the network's degree of automation, the agent can carry out local recovery actions. A prototype reflecting the ideas discussed in this paper is under implementation.
Published in:
Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on
(Volume:2
)
Date of Conference: 4-7 May 2003