Skip to Main Content
Problem determination in a large and dynamic IT service is a challenging task. In this paper we propose a framework for problem determination based on monitoring the event streams generated by the different components of an IT service. We give a generic representation of a problem through spatial-temporal patterns, which is a graph where the vertices capture the location and the time of the matching events, and the edges represent the spatio-temporal conditions between two matching events. The spatial conditions are based on the underlying system topology graph, and the temporal conditions are based on event timestamps.A practical implementation of the above framework will require fast algorithms for detecting patterns. We present efficient algorithms when the pattern graph is a range and a tree, which are then used as building blocks for a hierarchical heuristic for detecting general patterns. Finally, we show that our algorithms perform well in practice through extensive numerical simulations.