Skip to Main Content
We are concerned with the problem of detecting faults in distributed software, rapidly and accurately. We assume that the software is characterized by events or attributes, which determine operational modes; some of these modes may be identified as failures. We assume that these events are known and that their probabilistic structure, in their chronological evolution, is also known, for a finite set of different operational modes. We propose and analyze a sequential algorithm that detects changes in operational modes rapidly and reliably. Further more, a threshold operational parameter of the algorithm controls effectively the induced speed versus correct detection versus false detection tradeoff.