Skip to Main Content
The majority of data available in most disciplines is unlabeled and unclassified. The amount of data is often massive, hence scalable processing methods are required. One method of providing structure to unlabeled data is to group it by clustering. Density based methods discover the number of clusters. Additionally, the shape of such clusters can also be irregular. In this paper we examine a version of DBSCAN modified to use fuzzy membership functions (FN-DBSCAN). FN-DBSCAN was implemented using the WEKA data mining framework and a scalable technique (SFN-DBSCAN) is simulated using the framework. Experimental results show that SFN-DBSCAN can be over three times as fast as FN-DBSCAN for small to medium size data. The resulting cluster assignments match at an average rate of 90% when compared with assignments by FN-DBSCAN. SFN-DBSCAN's speed increases proportionally with respect to the number of subsets, but cluster assignment concurrence between FN-DBSCAN and SFN-DBSCAN suffers from degradation as the number of subsets increase.