By Topic

Clustering of Evolving Data Stream with Multiple Adaptive Sliding Window

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Hongbo Zhu ; DB&KE Lab., Sichuan Univ., Chengdu, China ; Yaqiang Wang ; Zhonghua Yu

Clustering of data streams has received much attention in recent years. Many algorithms have been proposed in order to discover clusters in data streams efficiently and effectively. Most of these algorithms are devoted to discover the clusters with high density, correlation and low intra-cluster distance. Especially, many methods are considered to balance the memory, the time consumption and the clustering effects. However, few related works focus on the important characteristic of evolving data stream that the distribution of the data items is dense and partially concentrative on the timeline, and one data item usually just has influence to other items in limited range of time. In this paper, we present a novel algorithm called MawStream (clustering with Multiple Adaptive sliding Window). In MawStream each data item in potential cluster associates with an adaptive time window. Instead of fixed length of window, the window length of a data item is computed according to its influence. Once one data item's window length becomes zero, it's called unactive and needs not further processing. Such mechanism avoids dealing with too many data in clustering data streams. MawStream can handle detailed correlations among finite points in a time bucket and find high quality clusters in period of time. At the same time, MawStream allows users to submit offline queries to determine whether some potential clusters are similar to the specified cluster in order to cope with in-definability of the number of potential clusters. The result of experiment using both actual and synthetic datasets shows that our algorithm could cluster data streams efficiently with high quality.

Published in:

Data Storage and Data Engineering (DSDE), 2010 International Conference on

Date of Conference:

9-10 Feb. 2010