Skip to Main Content
Revealing anomalies at the operating system (OS) level to support online diagnosis activities of complex software systems is a promising approach when traditional detection mechanisms (e.g., based on event logs, probes and heartbeats) are inadequate or cannot be applied. In this paper we propose a configurable detection framework to reveal anomalies in the OS behavior, related to system misbehaviors. The detector is based on online statistical analyses techniques, and it is designed for systems that operate under variable and non-stationary conditions. The framework is evaluated to detect the activation of software faults in a complex distributed system for Air Traffic Management (ATM). Results of experiments with two different OSs, namely Linux Red Hat EL5 and Windows Server 2008, show that the detector is effective for mission-critical systems. The framework can be configured to select the monitored indicators so as to tune the level of intrusivity. A sensitivity analysis of the detector parameters is carried out to show their impact on the performance and to give to practitioners guidelines for its field tuning.