Fast and Exact Monitoring of Co-Evolving Data Streams | IEEE Conference Publication | IEEE Xplore

Fast and Exact Monitoring of Co-Evolving Data Streams


Abstract:

Given a huge stream of multiple co-evolving sequences, such as motion capture and web-click logs, how can we find meaningful patterns and spot anomalies? Our aim is to mo...Show More

Abstract:

Given a huge stream of multiple co-evolving sequences, such as motion capture and web-click logs, how can we find meaningful patterns and spot anomalies? Our aim is to monitor data streams statistically, and find sub sequences that have the characteristics of a given hidden Markov model (HMM). For example, consider an online web-click stream, where massive amounts of access logs of millions of users are continuously generated every second. So how can we find meaningful building blocks and typical access patterns such as weekday/weekend patterns, and also, detect anomalies and intrusions? In this paper, we propose Stream Scan, a fast and exact algorithm for monitoring multiple co-evolving data streams. Our method has the following advantages: (a) it is effective, leading to novel discoveries and surprising outliers, (b) it is exact, and we theoretically prove that Stream Scan guarantees the exactness of the output, (c) it is fast, and requires O (1) time and space per time-tick. Our experiments on 67GB of real data illustrate that Stream Scan does indeed detect the qualifying subsequence patterns correctly and that it can offer great improvements in speed (up to 479,000 times) over its competitors.
Date of Conference: 14-17 December 2014
Date Added to IEEE Xplore: 29 January 2015
ISBN Information:

ISSN Information:

Conference Location: Shenzhen, China

I. Introduction

Data streams naturally arise in countless domains, such as medical analysis [10], online text [7], social activity mining [17], and sensor network monitoring [12]. For example, consider an online web-click stream, where a huge collection of logging entries are generated every second, with information of millions of users and URLs. The web-site owners would like to detect intrusions or target designed advertisements by investigating the user-click patterns. In such a situation, the most fundamental requirement is the efficient monitoring of data streams. Since the data streams arrive online at high bit rates and are potentially unbounded in size, the algorithm should handle ‘big data streams’ of billions (or even trillions [28]) of entries with fast response times, that is, it cannot afford any post-processing. And in addition, since the sampling rates of streams are frequently different and their time periods vary in practical situations, the mechanism should be robust against noise and provide scaling of the time axis.

Contact IEEE to Subscribe

References

References is not available for this document.