Skip to Main Content
Clustering is an important data exploration task. Several algorithms for clustering large data sets have been proposed in the literature using different methodologies, which can detect arbitrary shaped clusters where clusters are defined as dense regions separated by low-density regions. Traditional DBSCAN is an important density-based clustering algorithm. But it is difficult to set its two density thresholds (ε, MinPts) properly. And large volume of main memory must be available in order to guarantee DBSCAN to run smoothly. In this paper, a new DBSCAN based on k-nearest neighbors (KNN) is proposed, which merges KNN and DBSCAN to enhance DBSCAN. Firstly, the window-width of each data point is determined and the whole data set is partitioned into some fuzzy cluster (FC) by the KNN based on KDE. Next, the local parameters (ε, MinPts) of each FC are unsupervisedly determined according to the entropy theory. Finally, each local ε is mapped to the global ε, and each FC is separately clustered. The experimental results show that our clustering method achieves better performance on the quality of the resulting clustering and the results are not sensitive to the parameter k.
Date of Conference: 13-15 June 2005