Skip to Main Content
Clustering of data streams has received much attention in recent years. Many algorithms have been proposed in order to discover clusters in data streams efficiently and effectively. Most of these algorithms are devoted to discover the clusters with high density, correlation and low intra-cluster distance. Especially, many methods are considered to balance the memory, the time consumption and the clustering effects. However, few related works focus on the important characteristic of evolving data stream that the distribution of the data items is dense and partially concentrative on the timeline, and one data item usually just has influence to other items in limited range of time. In this paper, we present a novel algorithm called MawStream (clustering with Multiple Adaptive sliding Window). In MawStream each data item in potential cluster associates with an adaptive time window. Instead of fixed length of window, the window length of a data item is computed according to its influence. Once one data item's window length becomes zero, it's called unactive and needs not further processing. Such mechanism avoids dealing with too many data in clustering data streams. MawStream can handle detailed correlations among finite points in a time bucket and find high quality clusters in period of time. At the same time, MawStream allows users to submit offline queries to determine whether some potential clusters are similar to the specified cluster in order to cope with in-definability of the number of potential clusters. The result of experiment using both actual and synthetic datasets shows that our algorithm could cluster data streams efficiently with high quality.
Date of Conference: 9-10 Feb. 2010