FREDPC: A Feasible Residual Error-Based Density Peak Clustering Algorithm With the Fragment Merging Strategy

The most common issues for many clustering algorithms include the slow convergence, requirement for pre-specification of a number of parameters, and the lack of robustness when dealing with anomalies. Recently, the density peak clustering (DPC) algorithm was proposed to discover the centers of clusters by finding the density peaks in a dataset based on their local densities. The DPC needs neither an iterative process nor a large number of parameters, and it supports a heuristic approach, known as the decision graph, to manually select cluster centroids. However, the selection of the key parameters of the DPC was not systematically investigated. In this paper, we propose the feasible residual error-based density peak clustering algorithm with the fragment merging strategy, where the local density within the neighborhood region is measured through the residual error computation and the resulting residual errors are then used to generate residual fragments for cluster formation. The model parameters are then able to be calculated from the equations with statistical theoretical justification. We also develop a semi-automatic cluster identification method to eliminate the iterative process of manual centroid selection. The robustness and effectiveness of the proposed algorithm compared to the DPC and other clustering algorithms are demonstrated through experiments on standard benchmark datasets. The proposed method named feasible residual error-based density peak clustering (FREDPC) algorithm with the fragment merging strategy only needs to perform in one single step without any iteration and thus it is fast and has a great potential to be applied on a wide range of applications.


I. INTRODUCTION
Data Clustering, as an unsupervised learning technique, plays an important role in data mining.Specifically, it aims to organize finite unlabeled data points into disjoint groups on the basis of their intrinsic similarity.Over the last three decades, several strategies have been proposed for clustering, however, they may differ significantly in their definition of The associate editor coordinating the review of this manuscript and approving it for publication was Noor Zaman.
Density-based clustering algorithms excel in detecting arbitrary-shaped clusters even in the presence of noise in large problem space.Areas of higher density or a set of more densely connected data points than the remainder of the dataset are called clusters.Density is estimated as the number of points in a local environment.Among many densitybased spatial clustering algorithms which deal with noise, DBSCAN [31] is one of the most well-known ones, and it uses the concept of local density.In DBSCAN, with the optimal parameter setting, high-density connected regions are merged into a single cluster and noise is detected as data points having low density than the threshold value.Nevertheless, the task of optimizing parameters of DBSCAN can be non-trivial [32].Recently, Rodriguez and Liao proposed a density-based clustering algorithm called density peak clustering (DPC) [33], which adopted the idea of local density maxima from the mean-shift method [34] and the concept of implementing only single parameter of the distance between data points from K-Medoids [35].DPC has distinctive features such as i) being able to detect nonspherical clusters based on generated decision graph, ii) less number of control parameters, and iii) relatively low computational complexity.Much research has been carried out on this method [36]- [40].
However, the effectiveness of DPC depends greatly on the cutoff distance parameter C d to estimate the accurate density in terms of distance between data points.Concretely speaking, DPC uses the heuristic approach of a decision graph to manually select cluster centers, which is regulated by the value of C d (see Section II-C).Manual selection of cluster centroids is a major limitation of DPC in many applications.
For example, as illustrated in Fig. 1(a), apparently only two cluster centroids (labeled as '1' and '2') are clearly identified by the decision graph generated by DPC in the threecluster Iris dataset.Moreover, as shown in Fig. 1(b) that two cluster centroids (labelled as '6' and '4') are misidentified as a single centroid as they are overlapped by the generated decision graph in the six-cluster Glass dataset, which makes it extremely difficult for a user to select the exact six clusters.Also, the values of C d in both cases of the Iris and Glass datasets are assigned in a systematic manner (cutoff at 1% of the sorted distances among all data points, see Section II-C for more details).A better choice of selecting cluster centroids is related to the user's observation with respect to the nature of the dataset.As such, the performance of DPC is sometimes limited by manual identification of cluster centroids.To the best of our knowledge, robust methods for calculating accurate densities are not available [41], [42], and different methods are required to estimate density based on the nature of the dataset.
Furthermore, DPC lacks robustness when dealing with anomalies.Anomalies are the abnormal patterns found in the dataset, and the presence of anomalies indicate malicious activities that may lead to performance degradation [43].For example, as illustrated in Fig. 2, it is difficult for DPC to get natural clusters if local densities are randomly distributed [44], such that two anomalies in the top left corner are always considered as part of a larger cluster regardless of different C d values being used, because there is no ''noisesignal cutoff'' used in DPC [33].In such cases, DPC faces the difficulty in identifying the outliers even with varying C d values, and it may not be able to identify clusters of small sizes or clusters consisting of outliers (relatively speaking) only.In order to improve its capability, Parmar et al. [7] proposed the Residual Error-based Density Peak Clustering (REDPC) that measures local density within a neighborhood region by adopting residual error computation so that generated decision graph is better suited for cluster centroid identification.Furthermore, REDPC treats low-density data points as halo points and further processes them to detect anomalies.However, the limitation of manual selection of cluster centroid still exists in REDPC.
To overcome the aforementioned issues, in this research we propose the Feasible Residual Error-based Density Peak Clustering (FREDPC) algorithm with the fragment merging strategy.FREDPC adopts residual error computation to better estimate the local density of a dataset such that the generated set of residual errors are used to form residual fragments (see Section III-B for more details) and further process them to identify cluster centroids without using the heuristic approach of decision graph.Specifically, unlike DPC, FREDPC performs the reverse approach in the process of cluster formation.Initially FREDPC forms the cluster by merging residual fragments with higher similarities (see Section III-C for more details).Furthermore, FREDPC identifies the cluster centroids as the data points with relatively low residual error (see Section III-D for more details), which eventually eliminates the need for decision graph.
The term density fragment was originally defined by Jiang et al. [45] as ''a set of data points that consist of density decreasing points with relatively nearby distance''.In this paper, we adopt similar usage: residual fragments refer to the set of data point linked with its adjoin points and their respective neighborhood points that require further analysis to detect natural clusters and centroids (see Section III-B for more details).Due to the further analysis applied to residual fragments, FREDPC is capable of better identifying and handling various types of anomalies manifested in different patterns in different datasets (see Section III-D for more details).
In order to assess the performance of FREDPC, we compare FREDPC with K-Means [46], affinity propagation (AP) [21], DBSCAN [31], and DPC [33] on twelve UCI datasets and seven synthetic datasets (three synthetic datasets are self-defined but publicly available online).Experimental results show that our algorithm achieves best results on eighteen out of nineteen datasets and the second best on the remaining datasets.
Our main contributions are as follows: 1) We implement the residual error computation so as to compute local densities in underlying datasets within a neighborhood region.As such, the generated set of residual errors are used to form residual fragments and further process them to identify clusters and cluster centroids without using the heuristic approach of decision graph.2) We perform further analysis on residual fragments after obtaining the intermediate clustering results.As such, anomalies are effectively identified.3) We present experimental results on nineteen datasets.
In addition, we experimentally show that our proposed FREDPC clustering method performs better than DPC and other benchmark clustering algorithms.
The rest of the paper is structured as follows: Section II presents a brief introduction of DBSCAN, SCAN, and DPC as related work to ours.In Section III we present our proposed residual error-based fragment merging clustering method.In Section IV, to measure the performance of FREDPC, extensive experiments on both real-world and synthetic datasets are conducted with comparisons and discussions.Finally, conclusions and possible further work are given in section V.

II. RELATED WORK
In this section, to introduce some basic ideas and concepts used in our method, we briefly review the technical concepts and detailed steps of three density-based clustering methods: DBSCAN [31], SCAN [47], and DPC [33].

A. DBSCAN: DENSITY-BASED CLUSTERING APPROACH WITH NOISE
DBSCAN [31] is the first and the most well-known representative of the density-based clustering algorithm, and it has been demonstrated to be effective in a lot of real-world applications.It is popular because of the following reasons: i) it is capable of identifying arbitrary-shaped clusters; ii) the specifications of clusters a priori is not required; iii) a smaller number of control parameters is needed; iv) it is scalable to large datasets [38]; and v) it is robust against noise.The rationale of this algorithm is to obtain high-density regions as possible clusters ensuring that the density, represented by the number of objects in the neighborhood, exceeds certain specified thresholds.Also, regions with relatively lower density are isolated from the cluster denoted as noise.
In DBSCAN, the definitions of direct density-reachability, density-reachability, and density-connectivity (Definitions 2-4) [31] are used during cluster formation which in turn have asymmetric and symmetric relation between data points of each individual cluster.DBSCAN mainly uses two pre-determined density parameters ε and MinPts, and if a data point contains more MinPts than the ε-neighborhood a new cluster with core points (i.e., highdensity data points within clusters) will be created, then the DBSCAN will gather the density reachable data points from these core data points.When there are no new data points that can be further added into the cluster, DBSCAN will terminate.
DBSCAN has two major advantages in identifying arbitrary-shaped clusters with outlier detection, namely, the formation of a chain structure of high-density data points (i.e., core points) and identification of outliers as low-density data points.Nevertheless, there are distinct limitations of DBSCAN: i) the performance of clustering highly depends on the user-defined parameter values.It is sometimes difficult to estimate appropriate values for various datasets without prior knowledge; ii) it is sensitive to the order of the input parameters: different ordering of data points in the same dataset results in various consequences [48]; and iii) the adjacent clusters of different densities cannot be properly identified possibly due to the use of the global density parameters [49].

B. SCAN: A STRUCTURAL NETWORK CLUSTERING APPROACH
SCAN [47] is a well-known graph partitioning clustering method to understand the elementary notions of structures presented in graphs.It has been successfully applied in many applications due to its two distinctive features: i) it is able to detect not only densely connected data points as clusters but also identifies sparsely connected data points as hubs or outliers using the structure and the connectivity of the vertices as clustering criteria; and ii) it is fast with relatively low computational complexity on a given graph.SCAN is based on the notion that vertices sharing a certain quantity of neighbors should be grouped into one cluster, hubs and outliers should be isolated.
During cluster formation, SCAN identifies data points that have a lot of neighbors with a highly dense connection, i.e., the core point and then uses vertex structure [47] to evaluate density.The vertex structure (see ( 1)) of a data point is a set of data points composed of the data point itself and all its neighborhood data points.
Definition 1 (Vertex Structure [47]): Let v ∈ V , the structure of v is defined by its structural neighborhood, denoted by (v) The density of neighborhood nodes is computed by the common nodes in the vertex structure.SCAN identifies and normalizes the number of common neighborhood data points in two vertex structures by the geometric mean of the two vertex structure's size.This process is called structural similarity and it is defined as follows: Definition 2 (Structural Similarity [47]): The structural similarity between data points v and w, denoted by σ (v, w), is defined as follows: where (v) is defined in (1).When neighborhood data points share many components of its vertex structures, their structural similarity is high.SCAN detects core points by evaluating structural similarities for all neighborhoods from Definition 2. SCAN identifies not only clusters but also outliers.However, its performance highly depends on sensitive input parameters and assumes that the network is homogeneous and the adjacency matrix is already defined.An inspiration drawn from SCAN is that the role of each vertex in a graph can be efficiently measured by structural similarity and hence graph partitioning could be an efficient way to aggregate clusters.

C. DENSITY PEAK CLUSTERING (DPC)
DPC is based on the straightforward idea about cluster centroids: i) cluster centroids are characterized by high-density compare to its neighborhood points, and ii) cluster centroids are positioned at relatively higher distances from other data points with high-density.A decision graph is generated for centroid selection based on two basic attributes of each data point: i) ρ i (local density) and ii) δ i (distance between a data point and its nearest neighbor with higher ρ).
For example, a dataset is X P×Q = [x 1 , x 2 , . . ., x P ] T , where x i = [x 1i , x 2i , . . ., x Qi ] is a vector with Q number of attributes and P denotes the total number of data points.Initially, the distance matrix of the dataset needs to be computed.Let d(x i , x j ) denotes the distance between data points x i and x j , and it is computed as follows: For a data point x i , local density ρ i is defined as follows: where where P d = P 2 and D P d × c 100 ∈ D = {d 1 , d 2 , . . ., d P d }, wherein D is the set of all distances between every two data points in the dataset, where all the distances are ordered from smallest to largest, and c denotes the user-specified cutoff percentile.
δ i denotes the minimum distance between the data point x i and any other data point with higher density.The data points with the highest density locally or globally will have larger values of δ. δ i is computed as follows: After calculating values of ρ i and δ i for each data point in a dataset, DPC generates a decision graph (see Fig. 3) which is plotted with ρ i as x-axis and δ i as y-axis and ask a user FIGURE 3.An example of DPC's decision graph (excerpted from [33]).
to identify cluster centroids.According to the guideline, only those points with larger ρ and large δ compared to other data points in the dataset are considered as cluster centroids (see data points '1' and '10' in Fig. 3(b)).However, as shown previously in Fig. 1 that because DPC considers all the data points during the computation of local density (see ( 4)), it may not perform well on overlapping clusters.
Furthermore, DPC identifies the border region for each cluster, which contains data points that are part of the underlying cluster and also fall within the C d range of another cluster.Moreover, DPC traces the data point with maximum density within its border region of the cluster and denotes its density as ρ b .The data points of the cluster whose density is higher than ρ b are considered as part of the cluster core and others are considered as part of the cluster halo (suitable to be considered as noise or outliers) [33].DPC may not be able to process certain low-density data points when they are far from other identified clusters because, according to the definition, halo point has to be close to at least one data point belonging to another cluster.Hence as shown previously in Fig. 2, two data points in the top left corner are always part of the nearest identified cluster regardless of different values of C d in use.

III. FREDPC: FEASIBLE RESIDUAL ERROR-BASED DENSITY PEAK CLUSTERING ALGORITHM WITH THE FRAGMENT MERGING STRATEGY
In this section, we introduce our proposed clustering algorithm named Feasible Residual Error-based Density Peak Clustering (FREDPC) with the fragment merging strategy for better identification of cluster centroids and detection of anomalies.The proposed FREDPC algorithm inherits the strength of density estimation from DPC, density-connectivity within a neighborhood from DBSCAN, structural similarity measure from SCAN, and density measure from residual error theory.
The overall process of FREDPC consists of the following four stages and each stage is elaborated in the following subsections, respectively.
1) Preprocessing: Firstly, the residual error of each data points are computed as local density measurement (see the following subsection III-A).2) Residual fragment generation: Secondly, the residual fragment is generated based on the identification of adjoin points of respective data points along with its link structure and their respective neighborhood points.3) Residual fragment aggregation: Based on the principle of structural similarity (see (2)), structure similarity index (SSI) between each of residual fragments is computed.As such, the higher the structural similarity value, the higher probability of aggregation between each fragment will be, and similar clusters are formed with fragments of high structural similarity.Moreover, the cluster centroid of the generated cluster is identified as the data point with the lowest residual error.4) Final refinements: Finally, anomalies are isolated as the fragments with the least structural similarity with other residual fragments and the final clustering results are presented (with anomalies represented using special symbols).

A. PREPROCESSING
For better accurate estimation of local density, which may lead to better cluster formation and centroid identification, we adopt the residual error computation to measure the density of each data point within its neighborhood region.Specifically, the residual error e ij between data point x i and its neighbor x j is computed as follows: where N denotes the neighborhood size.It is a user-defined constant parameter used to find N number of the nearest neighbors of x i , wherein the Euclidean distance is used the same as in DPC (see ( 3)).Furthermore, the residual error of x i can be computed as follows: Comparing ( 8) with (4), it is obvious that by adopting the residual error computation, when measuring the local density, FREDPC only takes the data points within the neighborhood into consideration.On the other hand, DPC takes all the data points in the entire dataset into consideration.By only considering the local regional density, FREDPC is capable of measuring local density efficiently for better clustering results (see Section IV).
The detailed steps of computing and sorting e ij is summarized in Algorithm 1.After preprocessing, we first identify the adjoin point, neighborhood points for each data point to generate residual fragment for cluster formation and later anomaly detection (in the final stage).To identify the adjoin points and neighborhood points, we need to determine the value of the cutoff parameter C d .Similar to DPC, a cutoff residual C d value is predefined and the process for selecting C d is actually that for selecting the average number of neighbors of all data points in the dataset.In FREDPC, C d can be defined the same as that in DPC (see (5)).FREDPC then initiates the process to generate residual fragments in four phases:

Algorithm 1 The Preprocessing
1) Adjoin point identification: from the obtained set of sorted residual errors in ascending order sortd_e only the nearest neighbor of point x i can be adjoin points only if the distance between x i and adjoin points x j is less than the cutoff threshold C d .Moreover, once the adjoin points of data point x i is identified, the data point x i will be excluded from being identified as adjoin points of its adjoin points, and stored in adjoin points set aps, i.e.,: 2) Neighborhood points within C d identification: from the obtained set of sorted residual errors in ascending order sortd_e, the neighborhood points of each data point within the range of C d are identified.Also, once the neighborhood points of data point x i are identified data point x i will be excluded from being identified as one of the neighborhood points of its identified neighborhood points, and store in nneighset, i.e.,: 3) Adjoin points link generation: based on aps if the adjoin points of each data point x i is identified then each data point will connect to its adjoin points to form a link, i.e., x i + 1 within C d with the principle of density-reachability to generate adjptlink and store it in adjptlinkset (see Fig. 7(e)).4) Residual fragment generation: a single residual fragment is a structural network composed of link structure of data point and its identified adjoin points, i.e., adjptlink and their respective neighbors nneighset: Similarly, all residual fragments can be generated for further processing.

C. RESIDUAL FRAGMENT AGGREGATION
After the generation of residual fragment, the Residual fragment aggregation can be aggregated based on the principles of Structural Similarity (see (2)) and priori likelihood.The structural similarity is a score varying from 0% to 100% indicates the scale of matching degree of structural neighborhoods.When adjacent data points share many members of their structural neighborhoods, their structural similarity becomes high.Similarly, the structural network similarity between each residual fragment can be defined as: where eF x and eF y refer to the residual fragments of x and y respectively and eF sim (x, y) is the structural similarity index (SSI) between two residual fragments.The larger the value of eF sim (x, y), the higher the probability of aggregation between eF x and eF y is for each pair of residual fragment, and the threshold value for eF sim (x, y) can be denoted as si t .
The threshold values used to distinguish si t is heuristically determined to be 25%.If the value of eF sim (x, y) between any pair of the residual fragment is more than the value of si t , the aggregation of those residual fragments is processed with a priori likelihood that the residual fragments with the lowest residual have a higher priority to amalgamate with other residual fragments to form a cluster, as shown below: After cluster formation, the cluster centroid for each generated cluster is identified as the data point with the relatively lowest residual error and the cluster labels Cl of the remaining data points are assigned according to the identified cluster centroid.The detailed steps of residual fragment aggregation procedures in FREDPC are summarized in Algorithm 3.

D. FINAL REFINEMENTS
Anomaly (outlier) detection is a common problem for clustering algorithms in data analysis.The anomalous data points for each eF sim (x, y) do if eF sim (x, y) > si t then Aggregate residual fragment x and y as one cluster; else Generate new cluster; end if end for end for for each generated cluster do find the data point with the lowest residual error (cluster centroid); assign x i with the cluster label of identified centroid of respective cluster; update Cl accordingly; end for in the dataset can be defined as a deviation from normal behavior and can be associated with the erroneous conditions or malevolent activities that may evolve gradually over time.Therefore, in FREDPC, we further detect the anomalies and highlight them during visualization.
After residual fragment aggregation based on eF sim (x, y), the clusters that are generated which are composed of only a single data point is considered as anomalies (e.g., see Fig. 7(f) and Fig. 8(f) in Section IV-B).All the anomalies are collected in a set called anoset.
Furthermore, for each detected anomalies we carry out further investigation for the most possible cluster label.First, we find the nearest neighbors of each anomaly in anoset with the neighborhood size as the same as N defined in (7).Moreover, if there exist other anomalies within the neighborhood of anomaly, we then reject these anomalies from the neighborhood as their cluster labels are yet to be decided.Finally, we assign the cluster label of each anomaly to the majority cluster label in its neighborhood (if the majority ties, we assign the cluster label of the nearest data point belonging to any of the tying clusters).After implementing anomaly refinement process, clustering results may be improved and the detected anomalies are also highlighted visually for human inspections (see Section IV-B).
The detailed steps of anomaly refinement procedures in FREDPC are summarized in Algorithm 4.

E. COMPLEXITY ANALYSIS
The detailed steps of FREDPC is depicted in Fig. 4, wherein the information flow among the underlying dataset, user inputs, and the FREDPC algorithms are explicitly shown.The computational complexity of FREDPC for each of the stages i.e., Preprocessing, Initial assignments, Residual fragment aggregation, and Final Refinement are shown in the Table 1, wherein n denotes the number of data points in the underlying dataset.In comparison with other clustering algorithms, Table 2 shows that FREDPC has a moderate level of complexity, i.e., O(n2 ) in comparison with other clustering algorithms benchmarked in this paper, where I denotes the number of iterations and K denotes the user predefined number of clusters.Furthermore, in the following experiment section, for the further analysis we also compare the computational time taken by all the clustering algorithms.

IV. EXPERIMENTS
To evaluate the robustness and test the feasibility of the proposed FREDPC, we compare its performance on twelve UCI1 datasets, namely Iris, Thyroid, Liver, Ecoli, Pima, Breast, Glass, Wine, Vehicle, German, Ionosphere, and Sonar, four widely used synthetic datasets, 2 namely Flame, Aggression, Spiral, and R15, and three self-defined datasets, 3namely Twenty, D1 and D2 with K-Means [46], AP [21], DBSCAN [31] and DPC [33].The properties of all nineteen datasets are listed in Table 3.To demonstrate the effectiveness of our FREDPC algorithm, we use F-score to assess the quality of clustering results.In Table 4, we compare the performance of our proposed algorithm along with all benchmarking models (average of 10 independent runs) and visualize  them in Fig. 5.The number highlighted in bold indicates the corresponding algorithm has the best performance in terms of its corresponding evaluation, i.e., the corresponding column.As we can see, the F-score obtained by FREDPC are best on eighteen out of nineteen datasets.Nevertheless, FREDPC achieves second best on Aggregation, with minor difference of 1.000 -0.9880 = 0.012.After a further investigation in terms of correctly identified labeled data points, we find that the difference between DPC and FREDPC is three data points out of 788 data points.Nevertheless, the clustering accuracy of DPC is slightly better than FREDPC, this small amount may not be significant.In the following subsections, we further examine the capability of FREDPC in various aspects, respectively.
Moreover, we also compare the computational time taken by each algorithm (average of 10 runs for all the method) shown in Table 5.The comparison results are consistent with Table 2 indicating that the algorithm with lowest computational complexity such as DBSCAN requires the minimum  computational time and the algorithms with moderate level of computational complexity such as DPC and FREDPC require a moderate amount of computational time.Although, DPC and FREDPC both have the same computational complexity of O(n 2 ), overall FREDPC is approximately 60 ms slower.This is because the additional anomaly refinement procedure implemented by FREDPC for better handling the anomalies (see Algorithm 4).However, this compensation on computation time significantly improves the FREDPC's performance (see Table 4).The detailed parameter settings used for each method for the purpose of evaluating the performance and computational time spent by each method for UCI datasets are reported in Table 6 and for synthetic and self-defined datasets are shown in Fig. 7 to Fig. 13.We implemented all clustering algorithms using MATLAB 2016 and the experiments were conducted on the same 64-bit desktop computer installed with Intel(R) Core(TM) i3-4160 CPU at 3.60 GHz and 8 GB RAM.

A. DETECTING CLUSTER CENTROIDS
As aforesaid that the performance of DPC is sometimes limited by its heuristic approach of decision graph and manual selection of cluster centroids.In comparison, FREDPC has a relative advantage in automatic centroid detection.The reason being is that FREDPC measure the local density by employing residual error computation which facilitates in the generation of residual fragments, and then the clusters are generated by aggregating residual fragments.The cluster centroid is identified as the data point of the cluster with the lowest residual error.The cluster labels Cl of the remaining data points of the cluster are assigned according to the cluster centroid.As shown in Fig. 6, FREDPC achieves satisfactory results without human intervention.

B. DETECTING CLUSTERS WITH ANOMALIES
Anomaly detection is a fundamental feature of the clustering algorithm.Datasets Flame and D2 can be adopted to test the capability of FREDPC in anomaly detection in Fig. 7 and Fig. 8, respectively.In the Flame dataset the two anomalies are located in the top left corner and in the D2 dataset the five anomalies are located in the center.As clearly shown in Fig. 7 and Fig. 8, neither K-Means nor AP has the capability to identifying anomalies.While DBSCAN adopts MinPts and density-reachability to identify anomalies,  the overall performance in anomaly detection is unsatisfactory.As previously discussed DPC fails in anomaly detection.However, as illustrated in Fig. 7(f) and Fig. 8(f), only FREDPC can correctly identify all the possible anomalies.

C. DETECTING CLUSTERS OF ARBITRARY SHAPES
As discussed in literature review (see Section II-A) that the density-based clustering algorithms have the capability of identifying arbitrary-shaped clusters.As illustrated in Fig. 7,       sizes in Fig. 9 and 12, respectively.It is clearly evident that K-Means and AP are unable to process these two datasets.However, the performance of DBSCAN is better, but it still misidentifies some borderline points as anomalies.When compared with DPC, FREDPC show equal capability in clustering clusters with varying sizes.

E. DETECTING CLUSTERS OF DIFFERENT DENSITIES
To demonstrate the ability of FREDPC in identifying clusters of varying density we adopt the D1 dataset in Fig. 12.It is evident in this figure that K-Means and AP do not perform well in the dataset of varying densities.DBSCAN can only identify two lower clusters and misidentify top cluster of very low density as anomalies as they do not fulfill the density-reachability definition (see Section II-A).In comparison, both DPC and FREDPC can correctly identify all the clusters intuitively.

F. OVERALL COMPARISON OF FREDPC OVER BENCHMARKING MODELS
Compared to other benchmarking clustering algorithms, FREDPC performs better in detecting clusters of various properties in the previous subsections, respectively.The clustering performance of all clustering algorithms is summarized in Table 7.As shown in the table, it can be concluded that the proposed FREDPC method is an effective and well-designed algorithm working well in various performance evaluation aspects.

V. CONCLUSION
In this paper, we propose the Feasible Residual Error-based Density Peak Clustering (FREDPC) algorithm inspired by the idea of residual error and residual fragments.This method measures the local density within a neighborhood region through residual error computation and further processes them to generate residual fragments.As such the generated residual fragments are amalgamated based on the principle of structural similarity to improve its competence in cluster centroid identification without human intervention and better identify the possible anomalies.The experimental results of the classical UCI and synthetic datasets show that FREDPC is very effective compared to DPC and other algorithms.
In our future work, we will aim to further reduce the runtime complexity and apply our algorithm to more complex and high dimensional datasets.

FIGURE 1 .
FIGURE 1. Determination of cluster centroids from the decision graph generated by DPC on the Iris (a) and Glass (b) datasets with C d = 0.3000 and C d = 0.5230, respectively.

FIGURE 2 .
FIGURE 2. Visualizations of clusters identified in the Flame dataset by DPC with different C d parameter values.

Algorithm 3
The Residual Fragment Aggregation Procedures in FREDPC Input: Residual fragment set fragmentset obtained from Algorithm 2, si t Output: Cluster labels assigned to all the data points Cl for each residual fragment do Compute structural similarity index based on (12);

FIGURE 4 .
FIGURE 4. The work flow of the overall FREDPC algorithm.

FIGURE 5 .
FIGURE 5. Visualization of performance comparison on nineteen datasets.

Fig. 9 ,
Fig. 9, Fig. 10, Fig. 11, and Fig. 13, K-Means and AP partition some of the natural clusters and hence they are unable to perform well on these datasets.DBSCAN can correctly identify

Algorithm 2
The Residual Fragment Generation Procedure in FREDPC Input: DM, e ij , and sortd_e obtained from Algorithm 1, C d Output: Residual Fragment set fragmentset for each data point x i in sortd_e do if adjoin point identification criterion is met (see (9)) then update aps accordingly; for each neighborhood points of data point x i do if nneigh identification criterion is met (see (10)) then update nneighset accordingly;

end if end for connect
each data point to its adjoin points to form a link, i.e., x i + 1 and generate adjptlink; update adjptlinkset accordingly; generate residual fragments based on (11); update fragmentset accordingly;

TABLE 3 .
Properties of the UCI and synthetic datasets.

TABLE 5 .
Comparisons on computational time spent (in ms).

TABLE 6 .
Parameter settings.Cluster formation based on the adjoin points links generated by FREDPC on the Iris dataset.

TABLE 7 .
Overall performance comparison on different cluster features.