An Adaptive Density-Sensitive Similarity Measure Based Spectral Clustering Algorithm and Its Parallelization

The clustering effect of the spectral clustering algorithm depends on the calculation of the similarity between samples. Although a better clustering effect of the spectral clustering algorithm can be obtained using the Gaussian kernel function to calculate the similarity between samples, it relies on the setting of the kernel parameter. Therefore, an adaptive density-sensitive similarity measure based spectral clustering (DSSC) algorithm is proposed for improving the clustering effect. Specifically, firstly, the Euclidean distances between samples are calculated to get the nearest neighbors of each sample. Secondly, the standard deviation of distances between each sample and its nearest neighbors is calculated as the density parameter. Thirdly, the density-sensitive distances between each sample and its nearest neighbors are calculated. Finally, the similarities between each sample and its nearest neighbors are calculated to construct a similarity matrix. In addition, the proposed DSSC algorithm is parallelized on Dask distributed parallel computing platform with CPU+GPU, which can improve the computational efficiency of the DSSC algorithm by taking full advantage of the CPU and GPU resources. A series of experiments are conducted to verify the effectiveness of the proposed DSSC algorithm on several synthetic datasets and UCI datasets, and the results show that the DSSC algorithm not only achieves satisfactory clustering results, but also obtains better efficiency of performing large-scale clustering analysis.


I. INTRODUCTION
The clustering algorithm [1] is one of the unsupervised learning algorithms commonly used for data mining, and its purpose is to divide the samples of the same class into the same cluster as many as possible. The clustering algorithms can be divided into density-based clustering, partition clustering, hierarchical clustering, grid-based clustering, and graph-based clustering. The spectral clustering algorithm [2] belongs to graph-based clustering, which has a better clustering effect. It has been successfully applied to fault The associate editor coordinating the review of this manuscript and approving it for publication was Senthil Kumar . diagnosis [3], image segmentation [4], text classification [5], healthcare [6], and other fields.
In recent years, the spectral clustering algorithm is still a popular research object in the field of data mining [7]. Xie et al. [8] proposed a local standard deviation spectral clustering algorithm, which uses the standard deviations of distances between samples and their nearest neighbors as the kernel parameter of the Gaussian kernel function. Du et al. [9] used the local covariance to construct an adjacency matrix to improve the traditional spectral clustering, which ensures that the adjacency matrix is not affected by intersection points. Ye and Sakurai [10] adopted the probability neighborhood measure to calculate the similarities between samples, which can improve the adaptability to the complex data and VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the clustering accuracy of the spectral clustering algorithm. Park and Zhao [11] used the symmetric double stochastic similarity matrix to construct a Laplacian matrix, which is applied to the single-cell RNA-sequencing and good results are achieved. Zhu et al. [12] employed a local representation method to remove the interference of some outliers and sparsification to remove redundant features, and a robust similarity matrix is constructed. Yuan and Zhu [13] developed a spectral clustering algorithm based on fast search of natural neighbors, which can adaptively determine the number of natural neighbors of each sample and the number of clusters.
The above research has improved the clustering effect of the spectral clustering algorithm, but the Euclidean distance measure is utilized to calculate the similarities between samples, which cannot well reflect the distribution of samples.
Recently, the density-sensitive similarity measure is often used to improve the clustering effect of the spectral clustering algorithm. Zhang et al. [14] proposed a spectral clustering algorithm based on local density adaptive similarity, which adopts the common near neighbor measure method to construct a similarity matrix. Yang et al. [15] proposed a spectral clustering algorithm based on density sensitive similarity, which uses an adjustable line segment length measure method to calculate the distances between samples to construct a similarity matrix, and a random matrix is constructed based on the Markov chain. Yan et al. [16] improved the spectral clustering algorithm by using the density sensitive similarity measure and optimizing the selection of initial clustering centers of K-Means clustering algorithm. Wang et al. [17] adopted the density sensitive similarity measure to construct a similarity matrix and the affinity propagation algorithm to replace K-Means clustering algorithm for the final clustering. The above research has achieved good clustering results, but the parameter values are generally determined by experience, and the constructed similarity matrix is a dense matrix.
Although the spectral clustering algorithm can provide a high clustering accuracy, it usually needs to solve the eigenvalues and eigenvectors of the Laplacian matrix. This process is not only time-consuming but also takes up a lot of memory resources, which is not conducive to perform large-scale clustering analysis. To reduce the running time of the spectral clustering algorithm for large-scale data, the Nyström method is adopted in [18], [19]. Some distributed parallel computing frameworks are used to parallelize the spectral clustering algorithm recently. In [20], [21], Hadoop MapReduce is used to parallelize the spectral clustering algorithm. Taloba et al. [22] designed an efficient spectral clustering algorithm based on Spark for large-scale graph processing. Huo et al. [23] proposed an efficient parallel spectral clustering algorithm based on Julia. Although the above research can effectively reduce the running time of the spectral clustering algorithm, how to fully utilize all available computing resources of a cluster to improve the efficiency of performing large-scale clustering analysis is still a challenge.
In this paper, an adaptive density-sensitive similarity measure based spectral clustering algorithm is proposed, which can better calculate the similarities between samples and their nearest neighbors to improve the clustering effect to a certain extent. Aiming at the problems of long running time and high memory occupancy for the spectral clustering algorithm, the proposed DSSC algorithm is parallelized on Dask distributed parallel computing platform, which can improve the efficiency of the DSSC algorithm for performing large-scale clustering analysis.
The main contributions of the paper are as follows.
• An adaptive density-sensitive similarity measure method for the spectral clustering algorithm is proposed to improve the clustering effect. At first the nearest neighbors of each sample are determined according to the Euclidean distances between samples, then the density-sensitive distances between each sample and its nearest neighbors are adaptively calculated, and finally the similarities between each sample and its nearest neighbors are calculated to construct a similarity matrix.

•
The proposed DSSC algorithm is parallelized on Dask distributed parallel computing platform with CPU+GPU, which can improve the computational efficiency of the DSSC algorithm by taking full advantage of the CPU and GPU resources.
• A series of experiments are conducted to verify the effectiveness of the proposed DSSC algorithm on several synthetic datasets and UCI datasets, and the results show that the DSSC algorithm not only achieves satisfactory clustering results, but also obtains better efficiency of performing large-scale clustering analysis. The rest of the paper is organized as follows. Section II outlines NJW algorithm and Dask. Section III describes the proposed DSSC algorithm and its parallelization. Section IV presents the experimental results and analysis. Section V gives the conclusion.

A. OVERVIEW OF NJW ALGORITHM
The spectral clustering originates from graph theory, which transforms a clustering problem into a graph cut problem in essence. It can be divided into 2-way spectral clustering and multi-way spectral clustering according to the graph cut criteria. The Ng-Jordan-Weiss (NJW) algorithm [2] which belongs to the multi-way spectral clustering is widely used for data mining. The NJW algorithm is described in Algorithm 1, which includes the following steps.
Step 1: Construct a similarity matrix S. Assuming that a sample set X = (x 1 , x 2 , . . . , x n ) is given, the similarity s i,j between the sample x i and the sample x j can be calculated by where σ is a scale parameter.
Step 2: Construct a degree matrix D. The degree matrix is a diagonal matrix, so the value of the i-th element on the Algorithm 1 NJW Algorithm Input: The scale parameter σ , the number of clusters c Output: The clustering results 1: Construct a similarity matrix S by (1); 2: Construct a degree matrix D by (2); 3: Construct a Laplacian matrix: L = D − S; 4: Normalize L by (3) and obtain a normalized Laplacian matrix L sym ; 5: Perform eigen-decomposition for L sym and select eigenvectors corresponding to the first c maximum eigenvalues to construct a new matrix V ; 6: Standardize V by (4) to obtain a new matrix Y ; 7: Perform clustering analysis for Y using K-Means clustering algorithm; diagonal can be calculated by Step 3: Construct a normalized Laplacian matrix L sym . At first the Laplacian matrix L can be calculated as follows: L = D − S, and then the normalized Laplacian matrix can be constructed by Step 4: Perform eigen-decomposition for the normalized Laplacian matrix L sym , and the eigenvectors corresponding to the first c maximum eigenvalues are selected to construct a new matrix V = (v 1 , v 2 , . . . , v c ) T , where V is a n × c matrix. A new matrix Y can be obtained by standardizing the matrix V as follows: Step 5: The K-Means clustering algorithm is used to perform clustering analysis for the matrix Y .

B. OVERVIEW OF DASK
Dask [24] is a lightweight distributed parallel computing platform based on Python, which provides two types of APIs: the low-level APIs and the high-level APIs. The low-level APIs include Delayed and Future, and the high-level APIs include Array, Dataframe, and Bag. Users can construct a task graph using these APIs. Furthermore, Dask also provides three kinds of schedulers: multithreading, multiprocessing, and distributed. The multithreading or multiprocessing scheduler can only be used on a single node via threads or processes. The distributed scheduler can be used on a cluster with multiple workers. Fig. 1 depicts the flow of processing a task on a Dask cluster. Firstly, the client creates a task using APIs provided by Dask and submits the task to the distributed scheduler. Secondly, the distributed scheduler divides the task into several subtasks. Thirdly, these subtasks are assigned to several worker nodes. Each worker node processes the subtasks assigned to it independently, and the obtained subresults are returned to the distributed scheduler. Finally, the obtained results are returned to the client.

III. THE PROPOSED DSSC ALGORITHM AND ITS PARALLELIZATION A. THE PROPOSED DSSC ALGORITHM
The key of the spectral clustering algorithm is the calculation of the similarity between samples, and a good similarity measure method can improve the clustering effect of the spectral clustering algorithm. Therefore, an adaptive density-sensitive similarity measure method is proposed to optimize the construction of the similarity matrix for the spectral clustering algorithm. Assuming that the number of the nearest neighbors of each sample is k, the sample x i has k nearest neighbors: . , x i.k , and the density-sensitive distance between the sample x i and its l-th nearest neighbor can be calculated by In (5), p t represents a path point between the sample x i and its l-th nearest neighbor, where 1 ≤ t ≤ k. Specifically, the path point p 1 is one of among remaining k − 1 nearest neighbors excluding the l-th nearest neighbor, which has the shortest Euclidean distance from the sample x i . The path point p t is one of among remaining k − t nearest neighbors excluding the path points p 1 , p 2 , . . . , p t−1 and the l-th nearest neighbor, which has the shortest Euclidean distance from the path point p t−1 , where 2 ≤ t ≤ k − 1. The path point p k is the l-th nearest neighbor of the sample x i . als (p t , p t+1 ) is the adjustable line segment length between the path point p t and the path point p t+1 , which can be calculated as follows: In (6), dist (p t , p t+1 ) is the Euclidean distance between the path point p t and the path point p t+1 , the standard deviation of distances between the sample x i and its k nearest is an adaptive density parameter. The density parameter ρ i of the sample x i is determined according to the standard deviation of distances between the sample x i and its k nearest neighbors, which VOLUME 9, 2021 overcomes the weakness of manually setting the density parameter. Therefore, the similarity between the sample x i and its l-th nearest neighbor can be calculated by The proposed adaptive density-sensitive similarity measure based spectral clustering algorithm is described in Algorithm 2, which includes the following steps.
Step 1: Construct a similarity matrix. Supposing that there is a sample set X = (x 1 , x 2 , . . . , x n ), the calculation of the similarities between each sample and its nearest neighbors can be described as follows. Firstly, the Euclidean distance dist(x i , x j ) between the sample x i and the sample x j is calculated as follows: Secondly, k nearest neighbors of each sample are determined according to the shortest Euclidean distance. Thirdly, k − 1 path points of the path from the sample x i to its l-th nearest neighbor are determined according to the shortest Euclidean distance. Fourthly, the density-sensitive distance d(x i , x i.l ) between the sample x i and its l-th nearest neighbor is calculated by (5), where 1 ≤ l ≤ k. Fifthly, the similarity s i,l between the sample x i and its l-th nearest neighbor is calculated by (7), where 1 ≤ l ≤ k, and the similarities between the sample x i and its non-nearest neighbors are set to 0. Finally, the similar- Step 2: Construct a degree matrix. The value of the i-th element on the diagonal is calculated by (2), and the degree matrix D n×n is obtained, where 1 ≤ i ≤ n.
Step 3: Construct a normalized Laplacian matrix. At first the Laplacian matrix L is constructed as follows: L = D n×n − S n×n , and then the normalized Laplacian matrix L sym is obtained by (3).
Step 4: Select the eigenvectors to construct a new matrix. Firstly, the eigen-decomposition is performed for the matrix L sym to obtain z eigenvalues and z eigenvectors. Secondly, the eigenvectors corresponding to the first c minimum eigenvalues are selected from z eigenvectors to construct a new matrix Step 5: Perform clustering analysis. The K-Means clustering algorithm is used to perform clustering analysis for the matrix Y n×c , and the clustering results are obtained. Table 1 presents the running time of four main stages of the proposed DSSC algorithm obtained using one CPU core, eight CPU cores, and one GPU on a single worker node for a synthetic dataset, respectively. The four main stages include constructing a similarity matrix (stage 1), constructing a degree matrix and a normalized Laplacian matrix (stage 2), performing eigen-decomposition and selecting eigenvectors to construct a new matrix (stage 3), and performing clustering analysis using K-Means clustering algorithm (stage 4).

Algorithm 2
The Proposed DSSC Algorithm Input: n samples, the number of nearest neighbors k, the number of clusters c Output: The clustering results 1: for i ← 1 to n do 2: for j ← 1 to n do 3: Calculate the Euclidean distance between the sample x i and the sample x j ; 4: end for 5: end for 6: Determine k nearest neighbors of each sample; 7: for i ← 1 to n do 8: for l ← 1 to k do 9: Determine the path point p k , which is the l-th nearest neighbor of x i ; 10: Determine the path point p 1 , which is one of among remaining k − 1 nearest neighbors of x i and has the shortest Euclidean distance from x i ; 11: for t ← 2 to k − 1 do 12: Determine the path point p t , which is one of among remaining k − t nearest neighbors of x i and has the shortest Euclidean distance from the path point p t−1 ; 13: end for 14: Calculate the density-sensitive distance d(x i , x i.l ) by (5); 15: Calculate the similarity s i,l by (7); 16: end for 17: end for 18: Construct a degree matrix D n×n by (2); 19: Construct a Laplacian matrix L ← D n×n − S n×n ; 20: Get a normalized Laplacian matrix L sym by (3); 21: Perform eigen-decomposition for L sym and select the eigenvectors corresponding to the first c minimum eigenvalues to construct the matrix V n×c ; 22: Standardize V n×c by (4) to get the matrix Y n×c ; 23: Perform clustering analysis for Y n×c using K-Means clustering algorithm; It can be seen from Table 1 that the running time of these four stages obtained using eight CPU cores are decreased by 68.14%, 67.76%, 89.56%, and 17.65% compared with that obtained using one CPU core, respectively. The results show that it is necessary to parallelize the DSSC algorithm.
As shown in Table 1, the running time of stage 1 obtained using eight CPU cores is 21.03% less than that obtained using one GPU, this is because the 8-core CPU is more suitable to perform the complicated three-layer for-loop used for constructing a similarity matrix (see lines 7-17 in Algorithm 2) than the GPU. The running time of stage 2, stage 3, and stage 4 obtained using one GPU are decreased by 99.81%, 32.80%, and 35.71% compared with that obtained using eight CPU cores, respectively, which shows that stage 2, stage 3, and stage 4 are more suitable to be executed on the GPU. Therefore, the parallelization strategy of the proposed DSSC algorithm is as follows: stage 1 can be parallelized on CPU, and stage 2, stage 3, and stage 4 can be parallelized on GPU.

2) IMPLEMENTATION OF THE PARALLEL DSSC ALGORITHM
According to the parallelization strategy discussed in Section III-B1, the DSSC algorithm is parallelized on Dask distributed parallel computing platform with CPU+GPU, as shown in Fig. 2. Firstly, the similarity matrix is constructed on CPU of each worker node in parallel, and the obtained similarity matrix is copied from CPU to GPU. Secondly, the degree matrix and the normalized Laplacian matrix are constructed on GPU of each worker node in parallel. Thirdly, the eigen-decomposition is performed and the eigenvectors are selected to construct a new matrix on GPU of each worker node in parallel. Fourthly, the K-Means clustering is performed on GPU of each worker node in parallel, and the clustering results are copied from GPU to CPU. Finally, the clustering results are gathered from each worker node to the master node. The parallel DSSC algorithm is described in Algorithm 3, which includes the following steps.
Step 1: Divide the sample set. Supposing that the number of samples is n and the block size is m, n samples are divided into n/m blocks, n/m blocks are distributed to r worker nodes evenly, and all CPU threads or GPU threads are used to process (n/m)/r blocks in parallel on each worker node.
Step 2: Parallelly construct n/m similarity matrices on r worker nodes with CPU. For the sample set of the ω-th for ω ← 1 to n/m do 5: for i ← 1 to m do 6: for j ← 1 to m do 7: Calculate the distance between x ω i and x ω j ; 8: end for 9: end for 10: Determine k nearest neighbors of each sample in the ω-th block; 11: for i ← 1 to m do 12: for l ← 1 to k do 13: Determine the path point p k , which is the l-th nearest neighbor of x ω i ; 14: Determine the path point p 1 , which is one of among remaining k − 1 nearest neighbors of x ω i and has the shortest distance from x ω i ; 15: for t ← 2 to k − 1 do 16: Determine the path point p t , which is one of among remaining k − t nearest neighbors of x ω i and has the shortest distance from p t−1 ; 17: end for 18: Calculate the density-sensitive distance d(x ω i , x ω i.l ) by (5); 19: Calculate the similarity s ω i,j by (7); 20: end for 21: end for 22: Copy S ω m×m from CPU to GPU; 23: end for 24: end for 25: for all r worker nodes with GPU in parallel do 26: for ω ← 1 to n/m do 27: Construct a degree matrix D ω m×m by (2) dist(x ω i , x ω j ) between the sample x ω i and the sample x ω j is calculated as follows: Secondly, k nearest neighbors VOLUME 9, 2021 x ω i.1 , x ω i.2 , . . . , x ω i.k of the sample x ω i are determined according to the shortest Euclidean distance. Thirdly, k − 1 path points of the path from the sample x ω i to its l-th nearest neighbor are determined according to the shortest Euclidean distance. Fourthly, the density-sensitive distance d(x ω i , x ω i.l ) between the sample x ω i and its l-th nearest neighbor is calculated by (5), where 1 ≤ l ≤ k. Fifthly, the similarity s ω i,l between the sample x ω i and its l-th nearest neighbor is calculated by (7), where 1 ≤ l ≤ k, and the similarities between the sample x ω i and its non-nearest neighbors are set to 0. Finally, the ω-th similarity matrix S ω m×m is obtained and copied from CPU to GPU.
Step 3: Parallelly construct n/m degree matrices on r worker nodes with GPU. The value of the i-th element on the diagonal is calculated by (2), and the ω-th degree matrix D ω m×m is obtained, where 1 ≤ i ≤ m and 1 ≤ ω ≤ n/m.
Step 4: Parallelly construct n/m normalized Laplacian matrices on r worker nodes with GPU. At first the ω-th Laplacian matrix L ω is constructed as follows: L ω = D ω m×m − S ω m×m , and then the ω-th normalized Laplacian matrix L ω sym is obtained by (3), where 1 ≤ ω ≤ n/m.
Step 5: Parallelly select eigenvectors to construct n/m new matrices on r worker nodes with GPU. Firstly, the eigendecomposition is performed for the matrix L ω sym to obtain z eigenvalues and z eigenvectors, where 1 ≤ ω ≤ n/m. Secondly, the eigenvectors corresponding to the first c minimum eigenvalues are selected from z eigenvectors to construct a new matrix V ω the matrix V ω m×c is standardized by (4) to get a new matrix Y ω m×c .
Step 6: Parallelly perform clustering analysis on r worker nodes with GPU. The K-Means clustering algorithm is used to perform clustering analysis for the matrix Y ω m×c , where 1 ≤ ω ≤ n/m, and the clustering results are obtained and copied from GPU to CPU.
Step 7: Gather the clustering results from each worker node to the master node.

C. ANALYSIS OF TIME COMPLEXITY
The time complexities of the four main stages of the proposed DSSC algorithm are analyzed as follows.
In stage 1, the time complexity of calculating the Euclidean distances between samples is O(n 2 ), the time complexity of determining k nearest neighbors of each one of n samples is O(n 2 log n), the time complexity of calculating the similarities between n samples and their own k nearest neighbors (see lines 7-17 in Algorithm 2) is O(nk 2 ). Therefore, the time complexity of constructing the similarity matrix is O n 2 + n 2 log n + nk 2 , and the time complexity of parallelly constructing the similarity matrix on r worker nodes with θ available CPU threads each is O (n 2 + n 2 log n + nk 2 )/(rθ) . In stage 2, the time complexity of constructing the normalized Laplacian matrix is O(n 3 ), and the time complexity of parallelly constructing the normalized Laplacian matrix on r worker nodes with γ available GPU threads each is O(n 3 /(rγ )).
In stage 3, the time complexity of performing eigendecomposition for the normalized Laplacian matrix is O(n 3 ), and the time complexity of selecting the eigenvectors corresponding to the first c minimum eigenvalues is O(n log n). Therefore, the time complexity of performing eigen-decomposition and selecting eigenvectors to construct a new matrix is O(n 3 + n log n), and the time complexity of parallelly performing eigen-decomposition and selecting eigenvectors to construct a new matrix on r worker nodes with γ available GPU threads each is O (n 3 + n log n)/(rγ ) . In stage 4, the time complexity of performing clustering analysis for a n × c matrix using K-Means clustering algorithm with c clustering centers is O(nc 2 τ ), where τ is the number of iterations of K-Means clustering algorithm. The time complexity of parallelly performing clustering analysis using K-Means clustering algorithm on r worker nodes with γ available GPU threads each is O (nc 2 τ )/(rγ ) .

A. EXPERIMENTAL SETUP
In order to verify the effectiveness of the proposed DSSC algorithm, K-Means clustering algorithm [25], DBSCAN clustering algorithm [26], NJW algorithm [2], DSC algorithm [15], and the DSSC algorithm are implemented on the six synthetic datasets and four UCI datasets [27], respectively. Furthermore, the computational efficiency of the parallel DSSC algorithm is evaluated on the Dask cluster with three synthetic datasets used for large-scale clustering analysis.
The hardware configurations and software configurations of the experimental platform are listed in Table 2 and Table 3, respectively. Note that the CPU and the GPU of each worker node contain 8 CPU cores and 2560 GPU cores, respectively.

B. CLUSTERING ANALYSIS ON SYNTHETIC DATASETS
In order to more intuitively prove the superiority of the proposed DSSC algorithm, the experiments are conducted with K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, DSC algorithm, and the DSSC algorithm on six different synthetic datasets. The descriptions of six synthetic datasets are listed in Table 4, and the parameter settings of different clustering algorithms on six synthetic datasets are listed in Table 5.
As shown in Fig. 3, the K-Means clustering algorithm obtains good clustering effect on the fiveClusters dataset, but the clustering effects obtained on the other five non-convex       datasets are not good. This is because K-Means clustering algorithm uses the Euclidean distance measure to calculate the similarities between samples, it cannot effectively perform clustering analysis for the non-convex datasets. Besides, the clustering effect is also affected by the selection of initial clustering centers and the number of iterations.
As shown in Fig. 4, the DBSCAN clustering algorithm has a inferior clustering effect only on the fiveClusters dataset. This is because each kind of samples of the fiveClusters dataset has a different density, it is difficult to reasonably set the neighborhood distance threshold ε and the number of neighborhood samples MinPts.
As shown in Fig. 5, the NJW algorithm obtains satisfactory clustering effects on six different datasets, but this depends on finding an appropriate value of the scale parameter σ .   As shown in Fig. 6, the DSC algorithm achieves satisfactory clustering effects on the first five datasets, whereas the clustering effect obtained on the twoUnbalanceSpirals dataset is inferior. This is because the densities of the two kinds of samples of the twoUnbalanceSpirals dataset are quite different, it is difficult to find an appropriate value of the density parameter ρ to achieve a good clustering effect.
As shown in Fig. 7, the proposed DSSC algorithm achieves satisfactory clustering effects on six different datasets. This is because the DSSC algorithm can find an appropriate value of the density parameter ρ. Moreover, it can adaptively determine the value of ρ, thus it can obtain good clustering effects even on the complex datasets.

C. CLUSTERING ANALYSIS ON UCI DATASETS
In order to further verify the effectiveness of the proposed DSSC algorithm, in terms of four different performance evaluation indexes, the DSSC algorithm are compared with K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, and DSC algorithm on four different UCI datasets. These four performance evaluation indexes include the clustering accuracy, adjusted rand index (ARI) [28], Fowlkes-Mallows index (FMI) [29], and normalized mutual information (NMI) [30]. The descriptions of four UCI datasets are listed in Table 6, and the parameter settings of different clustering algorithms on these four UCI datasets are listed in Table 7.     Table 8 shows the clustering accuracies of different clustering algorithms obtained on different datasets. It can be seen from Table 8 that the clustering accuracies of five different clustering algorithms obtained on the Iris dataset are higher than that obtained on the other three datasets. Especially, the clustering accuracy of the DSSC algorithm obtained on the Iris dataset is 5.81%, 22.55%, and 16.49% higher than that obtained on the Seeds, Wine, and Zoo datasets, respectively. It also can be seen from Table 8 that the average clustering accuracy of the DSSC algorithm obtained on four datasets is 3.65%, 25.63%, 6.46%, and 6.11% higher than that of K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, and DSC algorithm obtained on four datasets, respectively. The results indicate that the proposed DSSC algorithm can offer a satisfactory clustering accuracy to a certain extent.
It can be seen from Table 9 that the average value of ARI of the DSSC algorithm obtained on four UCI datasets is 3.89%, 22.08%, 3.75%, and 8.13% higher than that of K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, and DSC algorithm obtained on these datasets,  respectively. It can be seen from Table 10 that the average value of FMI of the DSSC algorithm obtained on four UCI datasets is 2.97%, 13.17%, 0.46%, and 6.39% higher than that of K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, and DSC algorithm obtained on these datasets, respectively. It also can be seen from Table 11 that the average value of NMI of the DSSC algorithm obtained on four UCI datasets is 3.01%, 10.46%, 1.32%, and 4.31% higher than that of K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, and DSC algorithm obtained on these datasets, respectively. The results further prove that the proposed DSSC algorithm can achieve satisfactory clustering effects on different datasets. This is because the DSSC algorithm can adaptively determine the value of the density parameter to construct a more robust similarity matrix.

D. ANALYSIS OF PARAMETER SENSITIVITY
In the proposed DSSC algorithm, the number of nearest neighbors needs to be set manually. In order to explore whether the number of nearest neighbors has an impact on the clustering accuracy of the DSSC algorithm, the comparative experiments are carried out with different number of nearest neighbors on four different UCI datasets.
As shown in Fig. 8, with the increase of the number of nearest neighbors, the clustering accuracies of the DSSC algorithm obtained on four different datasets have a slight change. The maximum clustering accuracies of the DSSC algorithm obtained with different number of nearest neighbors on the

E. ANALYSIS OF COMPUTATIONAL EFFICIENCY
To evaluate the computational efficiency of the parallel DSSC algorithm, three different size of synthetic datasets used for large-scale clustering analysis are adopted to carry out experiments on the Dask cluster with CPU+GPU. The descriptions of these three synthetic datasets are shown in Table 12. Table 13 presents the running time of the parallel DSSC algorithm obtained with different number of worker nodes. For the three different size of datasets, with the increase of the number of worker nodes, the running time of the parallel DSSC algorithm gradually decrease. For the DB-1, DB-2, and DB-3 datasets, the running time of the parallel DSSC algorithm obtained with four worker nodes are 71.62%, 73.21%, and 74.08% less than that obtained with one worker node, respectively. Compared with one worker node, the running time of the parallel DSSC algorithm obtained with 2, 3, and 4 worker nodes on the three different size of datasets are decreased by 48.59%, 64.94%, and 72.97% on average, respectively. The results show that the computational efficiency of the parallel DSSC algorithm can be improved for large-scale clustering analysis to a certain extent by properly increasing the number of worker nodes.
The speedup is often used to evaluate the computational efficiency of a parallel algorithm. In this experiment, the absolute speedup is used to evaluate the computational efficiency of the parallel DSSC algorithm, and it can be calculated by where T s is the running time of the serial DSSC algorithm and T p is the running time of the parallel DSSC algorithm obtained on the Dask cluster with p worker nodes. Fig. 9 shows the speedups of the parallel DSSC algorithm obtained with different number of worker nodes on different size of datasets. As illustrated in Fig. 9, with the increase of the number of worker nodes, the speedups of the parallel DSSC algorithm obtained on the three different size of datasets gradually increase. For example, the speedup of the parallel DSSC algorithm obtained on the DB-3 dataset increases from 1.99× to 7.69× when the number of worker nodes increases from 1 to 4, which shows that the parallel DSSC algorithm has good parallelism. It also can be seen from Fig. 9 that the speedups of the parallel DSSC algorithm obtained with four worker nodes are 5.64×, 6.34×, and 7.69× for the DB-1, DB-2, and DB-3 datasets, respectively. The results demonstrate that the parallel DSSC algorithm achieves a high speedup in performing large-scale clustering analysis. This is because the parallel DSSC algorithm can make full use of the computing resources of a Dask cluster with CPU+GPU to improve the efficiency of performing large-scale clustering analysis in parallel.

F. EVALUATION ON DIFFERENT DASK CLUSTERS
In order to better evaluate the computational efficiency of the parallel DSSC algorithm, for the three different size of datasets listed in Table 12, the experiments are conducted on the Dask cluster with CPU and Dask cluster with CPU+GPU, respectively. It is worth noting that both of the Dask clusters contain four worker nodes.
As depicted in Fig. 10, with the increase of dataset size, the running time of the parallel DSSC algorithm obtained on both Dask clusters increase gradually. For the three different size of datasets, the running time of the parallel DSSC algorithm obtained on the Dask cluster with CPU are longer than that obtained on the Dask cluster with CPU+GPU. For  the DB-1, DB-2, and DB-3 datasets, the running time of the parallel DSSC algorithm obtained on the Dask cluster with CPU+GPU are 36.78%, 37.23%, and 38.68% shorter than that obtained on the Dask cluster with CPU, respectively. The results demonstrate that the parallel DSSC algorithm can fully utilize all available CPU and GPU resources of a Dask cluster to improve the computational efficiency.

V. CONCLUSION
In the paper, an adaptive density-sensitive similarity measure based spectral clustering algorithm is proposed to improve the clustering effect of the spectral clustering algorithm. First of all the Euclidean distances between samples are calculated to determine the nearest neighbors of samples, then the density-sensitive distances between samples and their nearest neighbors are adaptively calculated, and finally the similarities between samples and their nearest neighbors are calculated to construct a similarity matrix. In order to improve the efficiency of the DSSC algorithm for performing clustering analysis, the DSSC algorithm is parallelized on Dask distributed parallel computing platform with CPU+GPU. Experiments on six synthetic datasets and four UCI datasets show that the proposed DSSC algorithm obtains more satisfactory clustering results compared with K-Means clustering algorithm, DBSCAN clustering algorithm, NJW algorithm, and DSC algorithm. For example, the DSSC algorithm obtains a clustering accuracy of 95.33% on the Iris dataset. In addition, the parallel DSSC algorithm obtains better efficiency of performing large-scale clustering analysis. For example, the running time of the DSSC algorithm obtained with four worker nodes on the DB-3 dataset is reduced by 87.01% and 74.08% than that obtained with one CPU core and that obtained with one worker node, respectively.
Compared with some existing clustering algorithms, the proposed DSSC algorithm has the following advantages: 1) it can improve the clustering effect to a certain extent by adaptively calculating the similarities between each sample and its nearest neighbors; 2) it is more suitable for performing large-scale clustering analysis through its parallelization on Dask distributed parallel computing platform with CPU+GPU. However, the parallelization of the DSSC algorithm on GPU has the following limitations: 1) the parallel efficiency of constructing a similarity matrix on GPU is low, because the logic of constructing a similarity matrix is complex; 2) the size of each block is limited to the GPU RAM size when the dataset is divided into multiple blocks.
In the future work, in order to make the DSSC algorithm more suitable for performing larger-scale clustering analysis, the DSSC algorithm with lower time complexity will be explored. Moreover, how to fuse multiple DSSCs into a better one through the ensemble clustering technique will be considered to further improve the clustering effect. It is also worth considering that the DSSC algorithm will be applied to mechanical fault diagnosis or other fields.