Skip to Main Content
Density-based clustering has the advantages for: 1) allowing arbitrary shape of cluster and 2) not requiring the number of clusters as input. However, when clusters touch each other, both the cluster centers and cluster boundaries (as the peaks and valleys of the density distribution) become fuzzy and difficult to determine. We introduce the notion of cluster intensity function (CIF) which captures the important characteristics of clusters. When clusters are well-separated, CIFs are similar to density functions. But, when clusters become closed to each other, CIFs still clearly reveal cluster centers, cluster boundaries, and degree of membership of each data point to the cluster that it belongs. Clustering through bump hunting and valley seeking based on these functions are more robust than that based on density functions obtained by kernel density estimation, which are often oscillatory or oversmoothed. These problems of kernel density estimation are resolved using level set methods and related techniques. Comparisons with two existing density-based methods, valley seeking and DBSCAN, are presented which illustrate the advantages of our approach.
Pattern Analysis and Machine Intelligence, IEEE Transactions on (Volume:28 , Issue: 6 )
Date of Publication: June 2006