By Topic

PGMCLU: A novel parallel grid-based clustering algorithm for multi-density datasets

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Chen Xiaoyun ; Sch. of Inf. Sci. & Eng., Lanzhou Univ., Lanzhou, China ; Chen Yi ; Qi Xiaoli ; Yue Min
more authors

Clustering is one of the basic data mining tasks. Clustering high-dimensional and massive data points is a particularly important task in cluster analysis. But some existing clustering algorithms are merely suitable for small and medium sized datasets. Meanwhile, clustering multi-density datasets is also a very difficult task for some clustering methods. In this paper, to address these issues, we present a novel parallel grid-based clustering algorithm for multi-density datasets, called PGMCLU, based on the idea of data parallelism and merging local clusters. The proposed algorithm uses new measure, called grid compactness, which reflects the degree of tightness between data points within grid. Furthermore, it introduces the notion of grid feature for summarizing the information about grid, and proposes the novel approaches of data partition, local clustering and merging local clusters. Extensive theoretical analysis and experiment results on both real and synthetic datasets show that PGMCLU algorithm is effective and scalable, and has approximately linear speedup.

Published in:

Web Society, 2009. SWS '09. 1st IEEE Symposium on

Date of Conference:

23-24 Aug. 2009