Skip to Main Content
We propose a novel clustering algorithm that is similar in spirit to classification trees. The data is recursively split using a criterion that applies a discrete curve evolution method to the histogram of distances. The algorithm can be depicted through tree diagrams with triple splits. Leaf nodes represent either clusters or sets of observations that can not yet be clearly assigned to a cluster. After constructing the tree, unclassified data points are mapped to their closest clusters. The algorithm has several advantages. First, it deals effectively with observations that can not be unambiguously assigned to a cluster by allowing a "margin of error". Second, it automatically determines the number of clusters; apart from the margin of error the user only needs to specify the minimal cluster size but not the number of clusters. Third, it is linear with respect to the number of data points and thus suitable for very large data sets. Experiments involving both simulated and real data from different domains show that the proposed method is effective and efficient.