Skip to Main Content
We present the mechanisms for self-recovery in Konark, a mobile agent based system for monitoring network computing systems. An important aspect of our design is the use of the monitoring system's inherent capabilities to detect its own component failures. The Konark system is implemented using Ajanta. Our monitoring system achieves robustness by incorporating mechanisms for self-monitoring and self-configuration at different levels of the system architecture. The event detection, correlation, and notification mechanisms are used as the basic building blocks for failure detection. Our design uses the notion of continuous periodic detection and notification of a failure event until the failed components causing it are repaired.