By Topic

Gradual clustering algorithms

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Fei Wu ; PRiSM Lab., Versailles Univ., Versailles, France ; Gardarin, G.

Clustering is one of the important techniques in data mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different clusters. The similarity between two objects is defined by a distance function, e.g., the Euclidean distance, which satisfies the triangular inequality. Distance calculation is computationally very expensive and many algorithms have been proposed so far to solve this problem. This paper considers the gradual clustering problem. From practice, we noticed that the user often begins clustering on a small number of attributes, e.g., two. If the result is partially satisfying the user will continue clustering on a higher number of attributes, e.g., ten. We refer to this problem as the gradual clustering problem. In fact gradual clustering can be considered as vertically incremental clustering. Approaches are proposed to solve this problem. The main idea is to reduce the number of distance calculations by using the triangle inequality. Our method first stores in an index the distances between a representative object and objects in n-dimensional space. Then these pre-computed distances are used to avoid distance calculations in (n+m)-dimensional space. Two experiments on real data sets demonstrate the added value of our approaches. The implemented algorithms are based on the DBSCAN algorithm with an associated M-Tree as index tree. However the principles of our idea can well be integrated with other tree structures such as MVP-Tree, R*-Tree, etc., and with other clustering algorithms.

Published in:

Database Systems for Advanced Applications, 2001. Proceedings. Seventh International Conference on

Date of Conference:

21-21 April 2001