By Topic

Approximate Clustering on Distributed Data Streams

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Qi Zhang ; Department of Computer Science, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599-3175, USA. zhangq@cs.unc.edu ; Jinze Liu ; Wei Wang

We investigate the problem of clustering on distributed data streams. In particular, we consider the k-median clustering on stream data arriving at distributed sites which communicate through a routing tree. Distributed clustering on high speed data streams is a challenging task due to limited communication capacity, storage space, and computing power at each site. In this paper, we propose a suite of algorithms for computing (1 + epsiv) -approximate k-median clustering over distributed data streams under three different topology settings: topology-oblivious, height-aware, and path-aware. Our algorithms reduce the maximum per node transmission to polylog N (opposed to Omega(N) for transmitting the raw data). We have simulated our algorithms on a distributed stream system with both real and synthetic datasets composed of millions of data. In practice, our algorithms are able to reduce the data transmission to a small fraction of the original data. Moreover, our results indicate that the algorithms are scalable with respect to the data volume, approximation factor, and the number of sites.

Published in:

2008 IEEE 24th International Conference on Data Engineering

Date of Conference:

7-12 April 2008