Clustering Algorithm-Based Data Fusion Scheme for Robust Cooperative Spectrum Sensing

In a centralized cooperative spectrum sensing (CSS) system, it is vulnerable to malicious users (MUs) sending fraudulent sensing data, which can severely degrade the performance of CSS system. To solve this problem, we propose sensing data fusion schemes based on K-medoids and Mean-shift clustering algorithms to resist the MUs sending fraudulent sensing data in this paper. The cognitive users (CUs) send their local energy vector (EVs) to the fusion center which fuses these EVs as an EV with robustness by the proposed data fusion method. Specifically, this method takes a Medoids of all EVs as an initial value and searches for a high-density EV by iteratively as a representative statistical feature which is robust to malicious EVs from MUs. It does not need to distinguish MUs from CUs in the whole CSS process and considers constraints imposed by the CSS system such as the lack of information of PU and the number of MUs. Furthermore, we propose a global decision framework based on fast K-medoids or Mean-shift clustering algorithm, which is unaware of the distributions of primary user (PU) signal and environment noise. It is worth noting that this framework can avoid the derivation of threshold. The simulation results reflect the robustness of our proposed CSS scheme.


I. INTRODUCTION
Cognitive radio is promising technology to boost utilization and alleviate the spectrum shortage. The basic ideal of cognitive radio (CR) is that licensed spectrum bands are allowed to be accessed by cognitive users (CUs) when primary users (PUs) are absent [1]- [4]. Under this regulation, the spectrum sensing is a crucial technique within CR, which senses the spectrum band to find spectrum holes. There are many single CU spectrum sensing methods such as energy detection, matched filter detection, and cyclostationary feature The associate editor coordinating the review of this manuscript and approving it for publication was Julien Le Kernec . detection [5]- [7]. However, the detection performance of these spectrum sensing methods is susceptible to the impact of noise, hidden terminal, pass loss, shadowing and multipath fading, which may cause incorrect sensing results provided by a single CU [8]. To solve this problem, cooperative spectrum sensing (CSS) methods have been attracted a lot of interesting, which have been verified to be more reliable than single CU spectrum sensing.
In the CSS, each CU independently collects sensing data or performs local spectrum sensing for a particular spectrum band. These CUs send their data or results to the fusion center (FC) periodically. The FC receives sensing data or results from each CU, fuses these data or results by a fusion mechanism, and makes a global decision. However, due to the openness of low layers protocol stacks, the CSS is vulnerable to many severe attacks of malicious users (MUs). Spectrum sensing data falsification (SSDF) attack is considered in the paper. MUs tamper the locally collected sensing data and send these data to mislead the FC to make a wrong decision, which may seriously degrade the performance of CSS system.
To detect the MUs and eliminate the damage of its attacks in CSS, many techniques have been developed [9]- [11]. However, such existing techniques of against MUs are limit to some unrealistic assumptions that are easily violated in future or realistic spectrum sensing: 1) The attack ways are assumed to be identical and fixed in [9]. 2) The underlying distribution of PU signal and noise are assumed to be known [10]. Both of assumptions are easily violated in realistic spectrum sensing. 3) Most of existing techniques first identify MUs. Then, these reports of the MUs are prevented by a hard fusion mechanism [9], [11]. This may lead to the detection performance of CSS system degradation as the reports of MUs may be normal with a possible.
In this paper, we consider a centralized CSS system with soft fusion mechanism, where each CU independently collects sensing data and calculates energy vector (EV) for a particular spectrum band. Then, the CUs send their EVs to the FC periodically. The FC receives EV from each CU, fuses these EVs by a sensing data fusion scheme, and makes a global decision. In order to resist the attack of MUs, two sensing data fusion methods of soft fusion mechanism based on clustering algorithm are proposed. In these sensing data fusion methods, they fuse EVs from CUs as an EV with robustness for representing the status of PU. It is noted that the number of MUs is less than the honest users (HUs) should be assumed. We also propose a CSS framework which considers spectrum sensing as a two-class classification problem in clustering algorithm. The proposed CSS scheme does not need any prior information related to the attack strategy of MUs. Contributions of this paper can be briefly summarized as follows.
1) Two sensing data fusion methods are proposed. One method is based on K-medoids clustering algorithm, namely DF-medoids method. Another method is based on Mean-shift and K-medoids clustering algorithms, namely DFMS-medoids method. 2) In order to avoid threshold derivation, two robust CSS methods based on fast K-medoids and Mean-shift clustering algorithms are developed, respectively. 3) In simulation section, the performance of DF-medoids and DFMS-medoids methods is verified and the detection performance of these robust CSS methods is analyzed. The simulation results show that the proposed robust CSS methods improve the performance of spectrum when MUs attack the CSS system.
The rest of this paper is organized as follows. Section II introduces the related works. Section III introduces the system model of CSS. Two sensing data fusion methods are proposed in Section IV, which are DF-medoids and DFMSmedoids, respectively. Based on fast K-medoids and Meanshift clustering algorithms, two achieving spectrum sensing methods are proposed in Section V. Section VI simulates the proposed rubost CSS methods and proves that these methods can improve the robustness of spectrum sensing. Section VII summarizes the full text.

II. RELATED WORKS
MUs attack is one of the threats for CSS system in CR networks, because they may cause the CSS system instability, such as increasing the probability of a false alarm and decreasing the the probability of detection. Many approaches have been reported to defense MUs attack based on weight assigned in recent literature. In [10], according to the difference between the energy of CU and the average energy of all CUs, each CU was given a Kullback Leibler divergence (KLD) score to assign weights for the sensing reports of CUs before sensing data fusion at the FC. The MUs were assigned low weights, whereas the HUs were assigned high weights. In [12], based on the assistance of trusted nodes, a reputation-based CSS method was proposed. The CUs were divided into three states, i.e., reliable, pending, and discarded. The decision of each CU was weighted by their reputation. If the CU was divided into discard, it was removed in the CSS. Generally, the defense framework includes two processes, i.e., defense reference establishment, and reputation evaluation.
There are many outlier detection techniques based robust spectrum sensing schemes proposed in [13]- [16]. In [13], a modified version of Grubb test was used for detection of a single outlier in a normally distributed data. In [14], the authors investigated outlier detection techniques to identify the MUs, in which outlier factors were assigned to each CU for distinguishing MUs from CUs. However, this scheme needs the maximum number of the MUs. In [15], the outlier detection technique was introduced to pre-filter abnormal sensing data. Then, a trust factor utilized as the weight to be given each CU. This method mere considers simple attacks such as 'always yes' or 'always no' without coping with more reality MUs attacks. In [16], to mitigate SSDF attack, the support vector data description (SVDD) was applied in sensing procedure. The SVDD algorithm distinguishes MUs from HUs and omits outliers from global decision process. The SVDD algorithm is a kind of one-class classification which considers sensing data as a target class and the rest of sensing data as the outliers. However, the method mere considers sample MUs ('always yes' or 'always no').
Many different metrics are introduced to distinguish MUs and HUs [11], [17], [18]. In [11], based on double-sided neighbor distance and frequency check, a robust CSS was proposed to detect MUs. In [17], the maximum mean discrepancy (MMD) was used as a metric of distance for the sensing reports to distinguish MUs and HUs. The genuine reports from MUs may still be used for sensing data fusion.
A Kruskal-Wallis test based MUs detection scheme was proposed in [18]. However, these methods are built upon the assumption that a small fraction of CUs are MUs.
To the best of our knowledge, the data analysis ideal based on clustering algorithm for sensing data fusion still remain open issues that require further investigation.

III. SYSTEM MODEL
In this section, the system model is given and introduced. Furthermore, the attack model used in this paper is given.

A. SYSTEM MODEL OF COGNITIVE RADIO NETWORK
The centralized CSS scenario is illustrated in Fig. 1. In this paper, consider C CUs with multi-antenna take part in CSS related to a licensed spectrum band. In each time slot, each CU collects sensing data from a specified channel, calculates energy, and sends its local energy to a FC. Note that all CUs need to be time-synchronized. Then, the FC makes a global decision and informs CUs whether the licensed spectrum can be accessed. The communication channels between CUs and the FC are assumed to be dedicated and reliable.
We assume that there are M MUs and the rest of CUs are HUs in the CSS system model. It is reasonable to assume that M C, because studying a full of MUs network is meaningless. The FC does not know any information related to MUs, such as the number of MUs and the identities of MUs. For the MUs, we assume that they collect sensing data, make local decisions by their local detection method (such as energy detection), and send falsified energy to the FC for misleading the FC to make a wrong global decision.
Assume that the PU is either idle or active throughout the sensing period. Thus, the received signal by the lth antenna of the ith CU can be formulated as a binary hypothesis [19], such that where k = 1, 2, . . . , N , N is the number of sampling points, l = 1, 2, . . . , L, L represents the number of antenna, i = 1, 2, . . . , C, C represents the number of CUs. Some necessary assumptions are defined as 1) x l i (k) is the received signal at the l antenna of the ith CU in the time k, p(k) is the signal from PU, h l i (k) is the channel gain between the l antenna of the ith CU and PU. 2) n l i (k) represents the Gaussian white noise (WGN) with zero mean and a variance σ 2 .
3) Each CU has certain computing ability and can simply perform data processing. According to (1), in the condition of H 0 and H 1 , the observed energy at the lth antenna of the ith CU can be represented as Therefore, the EV of ith CU can be represented as where e i ∈ R L×1 . In spectrum sensing, the detection probability, missed detection probability, and the false alarm probability are three key index to evaluate the performance of spectrum sensing schemes [20]. The detection probability is defined as whereĤ 1 is the measured state of the PU that is using authorized spectrum, H 1 represents the actual status of the PU that is busy. Furthermore, the missed detection probability is given by whereĤ 0 is the measured state of the PU signal that is absent. Moreover, the form of false alarm probability is given as where H 0 represents the actual state of the PU that is absent.

B. ATTACK MODEL
The presence of MUs may severely degrade the detection performance of CSS system [21]. There are two purposes for MUs to attack the CSS system, which are to destroy the spectrum sensing system and obtain its own benefits, respectively [22], [23]. If the target of the MUs is to destroy the CSS system, when the PU is using the licensed spectrum, the MUs send the idle information of the PU to the FC, which cause the FC to make a mistake decision. Furthermore, The normal communication of the PU is interfered, and the trust between the PU and CUs are distrusted. If the target of MUs is to obtain its own benefits which is to monopolize the spectrum through abnormal ways, MUs send the information that the PU is using the licensed spectrum when the authorized spectrum is actually idle to the FC. Thereby, the FC considers that the PU VOLUME 8, 2020 is active. Unfortunately, the HUs cannot access the licensed spectrum.
In this paper, the HUs send the original local EVs to the FC. However, the MUs send the falsified EVs to the FC. For example, the MUs find that the PU is absent by local judgment when the PU is absent. Then, they send falsified and higher EVs than real EVs to the FC. If the FC incorrectly makes a judgment about the states of PU, the licensed spectrum can be accessed by the MUs. When the PU is active, the MUs find that the PU exists by its local judgment, they will send a falsified and lower EVs than real EVs to the FC. If the FC incorrectly believes that the licensed spectrum is idle, allowing the CUs access, which will interfere the normal communication of the PU.

IV. SENSING DATA FUSION BASED ON CLUSTERING ALGORITHM
In this section, to defense MUs attack, we propose two sensing data fusion methods based on clustering algorithm.
The mean fusion method is very susceptible to interference by the falsified data from MUs, which causes the merged data to not accurately reflect real states of the PU. Inspired by [24], we propose a sensing data fusion method based on K-medoids clustering algorithm, which is called DF-medoids method. The medoids is more robust than the mean. Furthermore, we further propose a sensing data fusion approach based on DF-medoids method and Mean-shift clustering algorithm, namely DFMS-meioids, which uses the medoids as an initial value. Then, it performs sensing data fusion by iterative. It is noted that FC does not need to find out MUs in the whole data fusion process. The sensing data fusion methods proposed in this paper only needs to fuse the EVs received by the FC as an EV.
After each CU uploads its EV to the FC, let E be a set of EVs, i.e., It is noted that E includes the honest EV from HUs and the falsified EV from MUs. Define a ideal EV, which is the average EV calculated by EVs from all HUs, such that The mean of EVs from all CUs is calculated by From (8) and (9), we can conclude that the e means will deviate the e ideal (e means = e ideal ) when MUs send falsified sensing data. Therefore, the e means can not reflect the real states of PU, the mistaken decision may be made by FC. Thus, a effect sensing data fusion method is necessary, two sensing data method are proposed in the following section, respectively.

A. SENSING DATA FUSION BASED ON K-MEDOIDS CLUSTERING ALGORITHM
In this subsection, a DF-medoids method for sensing data fusion is proposed. Different with the mean of all EVs, the K-medoids [26] is used to find a existing EV in the set E, which is called medoids. The medoids has this principle that the sum of distances between the medoids and each EV in E is shortest.
Define the object function D(·) of DF-medoids as By minimizing the objective function D(e i ), we can obtain a robust medoids of all EVs from CUs, such that e medoids = arg min Then, the e medoids is used as a feature vector for CSS.

B. SENSING DATA FUSION BASED ON MEAN-SHIFT CLUSTERING ALGORITHM
The Mean-shift algorithm is a density-based clustering algorithm, which is robust for falsified data [27]. In this subsection, the Mean-shift clustering algorithm is studied for sensing data fusion. Based on the Mean-shift clustering algorithm, a DFMS-medoids fusion algorithm is proposed, which obtain the fused data iteratively by using medoids e medoids as the initial value. Based on (11), the e medodis is used as the initialized EV e meanshit = e medodis . The DFMS-medoids method searches for a center with relatively large density and updates a new center based on the original center.
Define a neighborhood S h (e i ) with e meanshit as the center and r as the bandwidth, such as S h (e i ) = {y|(y − e meanshit )(y − e meanshit ) T < r}, (12) where By calculating the distance between the e i , e i ∈ S h with the e meanshit , we can get the mean-shift vector u by where B represents the number of EVs in S h . According to the mean-shift vector calculated by (14), the new center e meanshit can be obtained by e meanshit = e meanshit + u.
Then, the e meanshit is used as a feature vector for spectrum sensing.
The data fusion procedure based on Mean-shift is shown in Algorithm 1. 5780 VOLUME 8, 2020
Step 4: Calculate the neighborhood S h with e meanshit as the center and r as the bandwidth.
Step 7: If the algorithm converges, then go the Step 7; otherwise, go to Step 3.

V. ACHIEVING COOPERATIVE SPECTRUM SENSING BASED ON CLUSTERING ALGORITHMS
In spectrum sensing, samples mere need to be clustered into two classes [28], [29] by clustering algorithm in unsupervised learning. In this section, fast K-medoids and Mean-shift clustering algorithms are used to achieve spectrum sensing.
The detail flow of CSS based on clustering algorithm is shown in Fig. 2. It is divided into two parts, i.e., the training part, and the spectrum sensing part. In the training part, the CUs observe particular authorized spectrum, collect sensing data, and send their EVs to FC which uses data fusion method to obtain enough e g , g ∈ {means, medoids, meanshift}. Assume that these e g contain two status of PU. Then, these e g can be clustered and trained by clustering algorithm. In spectrum sensing part, the CUs collect sensing data from a particular authorized spectrum in which the status of PU is unknown and send these data to the FC. Then, the FC performs data fusion by proposed data fusion method to obtainê g . Finally, theê g is used as a input for the classifier which will give a decision about the status of PU.
By observing the licensed spectrum, the FC can get enough samples which are from the CUs after data fusion. Denote a set S which include all samples as where g ∈ {means, medoids, meanshift}, e j g is the jth EV after data fusion. J represents the number of training EVs.
In CSS, the set S is clustered into two subsets. Denote the S k is the kth class, such that where k = 1, 2. Then clustering algorithms are used to cluster these samples into two classes.

A. CSS BASED ON FAST K-MEDOIDS CLUSTERING ALGORITHM
In previous work [28], [30], K-means clustering algorithm is introduced to achieve spectrum sensing. Unfortunately, K-means clustering algorithm is sensitive to outlier samples although it is quite efficient in the computational time. Thus, K-medoids clustering algorithm is introduced, which use the medoids instead of means. The medoids are the samples in its class, which are less sensitive to outlier samples. However, comparing with K-means clustering algorithm, K-medoids clustering algorithm is more complicated. Hence, a fast K-medoids clustering algorithm is introduced in this paper, which runs like the K-means clustering algorithm with the robust for outlier samples.
The dissimilarity measures between sample j and q is calculated by Euclidean distance, such that Then, a distance matrix can be obtained as where D ∈ R J ×J , which is used for finding new medoids at each iterative step. For each class S k , the representative medoids should be found, which has the smallest sum of distances to each sample in its class. The index of the medoids a k is found by where I k is a set that contains the samples index of S k in S. Then, the medoids k is updated by k = e a k g . The fast K-medoids clustering algorithm uses the Euclidean distance as a object function (·), such that The object of the fast K-medoids clustering algorithm is to optimal the object function, i.e., min (I 1 ,I 2 , 1 , 2 ) The training procedure based on fast K-medoids clustering algorithm is shown in Algorithm 2.

Algorithm 2 Training Procedure Based on Fast K-Medoids Clustering Algorithm
Step 1: Input training data S = {e 1 g , e 2 g , . . . , e J g }.
Step 4: Find two new medoids 1 and 2 based on the distance matrix D, which are the samples minimizing the total distance to other samples in its classes.
Step 5: Assign each samples to the nearest medoid and obtain classes S 1 and S 2 .
Step 6: Calculate the by (21). If the is not changed, then go the Step 7; otherwise, go to Step 4.

B. CSS BASED ON MEAN-SHIFT CLUSTERING ALGORITHM
Mean-shift clustering algorithm [27], [31] is a hill climbing algorithm based on density estimation, which can be used for clustering, image segmentation, and tracking. In this section, the Mean-shift clustering algorithm is used to cluster EVs after data fusion by using the proposed sensing data fusion methods.
For each subclass S k , it has corresponding center which is defined as where num(S k ) denotes the number of e j g belong to the class S k . Denote a neighborhood S k (e j g ) with k as the center and r k as the bandwidth, such that where By calculating the distance between the e j g , e j g ∈ S k with the k , the mean-shift vector u k is obtain as Then, the new k can be updated by The training procedure based on Mean-shift clustering algorithm is shown Algorithm 3.

Algorithm 3 Training Procedure Based on Mean-Shift Clustering Algorithm
Step 1: Input training data S = {e 1 g , e 2 g , . . . , e J g }.
Step 3: Assign each sample to the nearest 1 and 2 and obtain the new classes S 1 and S 2 .
Step 4: Calculate the bandwidth r k by (25).
Step 5: Calculate the neighborhood S k with k as the center and r k as the bandwidth.
Step 8: If the algorithm converges, then go the Step 9; otherwise, go back to Step 3.

C. ACHIEVING CSS BASED ON TRAINED CLASSIFIER
After training, we can obtain a classifier to achieve spectrum sensing, the specific form is given as whereê g represents the EV which is processed by the DFMS method. It is noted thatê g is unknown about the state of PU. If Classifier(ê g ) > ε indicates that PU is using the authorized spectrum. Otherwise, the authorized spectrum can be accessed by CUs.

VI. SIMULATION
In this section, the performance of the proposed robust CSS methods is illustrated through computer simulation. The simulation platform is Matlab. The AM signal is chosen as the PU signal p(k) in this paper. First, the performance of the developed sensing data fusion methods is presented. Then, the classification effect of clustering algorithms are analyzed. Finally, the performance of the proposed robust CSS method is given.

A. THE PERFORMANCE ANALYSIS OF THE DATA FUSION METHOD
The performance analysis of the DFMS method is presented in this section. Let the number of CUs be C = 100, the number of EVs received by FC be REV = 100, the number of honest EVs be HEV = 80 and the number of falsified EVs be MEV = 20. Remark: The robust CSS methods proposed in this paper is based on the received EVs, which can not care related to the attack mode of MUs. It means that the proposed robust CSS method can defend ''always no'', ''always yes'', and smart attack from MUs.
At SNR = −10 dB, N = 1000, Fig. 3 shows that these 'x' points are the EVs from every HU and MU. As shown in Fig. 4, these 'x' points represent falsified local EVs from MUs, these '*' points represent honest local EVs from HUs. The pentagram represents the ideal mean value e ideal , which is calculated by using honest EVs. The square point indicates the mean value e means of EVs from HUs and MUs. The diamond represents the medoids e medoids of all EVs, which is calculated by DF-medoids method. The triangle represents the mean value e meanshit calculated by DFMSmedoids method. From Fig. 4, we can conclude that the DFMS-medoids method has better robustness than other data fusion method, such as Means and DF-medoids.
In the CRN, when MUs find that the PU is using the authorized spectrum, the MUs will report low EVs to FC for disturb the correct decision made by FC. When MUs find that  the PU is not using the authorized spectrum, the MUs will report high EVs to FC which may make a incorrect decision related to the PU signal. Then, it informs the CUs that the authorized spectrum is busy and can not be accessed.
As shown in Figs. 5 and 6, DFMS-medoids method has the short distance between e meanshift with the idea EV e ideal comparing with Means and DF-medoids methods, which indicates the DFMS-medoids method can effect restrain the attack from MUs. In Figs. 5 and 6, the parameters are SNR = −8 dB, HEV = 80, MEV = 20, L = 2, and N = 1000.

B. CLASSIFICATION EFFECT ANALYSIS OF CLUSTERING ALGORITHM
In this section, we will present the clustering effect of fast K-medoids and Mean-shift clustering algorithms. The size of training set S is J = 10000. It is noted that the S include two states of PU, i.e., the PU is active, and the PU is absent. Fig. 7 shows the unclassified samples collected at SNR = −10 dB, HEV = 80, MEV = 20, L = 2, and N = 1000. VOLUME 8, 2020 FIGURE 6. The distance between fused EV and ideal EV when MUs send many higher EVs.    8 shows the effect of unclassified samples clustered by the Mean-shift clustering algorithm. These red '*' dots are considered that the PU is active. These blue 'x' dots are considered that the PU is absent. Fig. 9 displays the effect of unclassified samples clustered by the fast K-medoids clustering algorithm.  It is difficult to distinguish the performance of the two clustering algorithms from Figs. 8 and 9. In Fig. 10, the loss function is presented. Comparing with the loss value of fast K-medoids and Mean-shift clustering algorithms, we can conclude that the Mean-shift has the better performance in this case. Thus, in the subsequent sections, Mean-shift clustering algorithm is chosen to achieve spectrum sensing.

C. THE PERFORMANCE ANALYSIS OF ROBUST CSS
In this section, the performance of spectrum sensing is verified. As shown in Figs. 11 and 12, the performance of different methods are presented. The parameters are set as N = 1000, L = 2, SNR = −15 dB.
In Fig. 11, the MUs send the lower EVs for interfering the FC to make correct decision. In this case, the detection probability will decrease under a certain false alarm probability. From Fig. 11, we can observe that the DFMS-medoids and DF-medoids based robust CSS has better performance than Means method. Furthermore, the DFMS-medoids method has the best performance in these method.
In Fig. 12, the MUs send the higher EVs for misleading the FC to make a incorrect decision that the licensed spectrum is busy when is actually idle. Thus, MUs can avoid competing with other HUs and access the licensed spectrum by itselfs. In this case, the false alarm probability will increase under a certain missed detection probability. From Fig. 12, we can see that the DFMS-medoids and DF-medoids based robust CSS has better performance than Means. Furthermore,  DFMS-medoids method has the best performance in these methods.
In general, the robust CSS method which combined DFMS-medoids and Mean-shift clustering algorithm has better sensing performance, since DFMS-medoids data fusion method is more robust than DF-mediods and Means method and the clustering effectiveness of Mean-shift is better than K-medoids. From Figs. 5, 6, and 10-12, it is easy to verity this conclusion.

VII. CONCLUSION
In this paper, we propose DF-medoids and DFMS-medoids data fusion methods to defend against SSDF attack for CSS in CRNs. These data fusion methods can effectively suppress malicious data to impact on decision made by the FC. Unlike existing defense schemes that need to find who are MUs and prohibit MUs from joining data fusion. Our methods can directly fuse sensing data from all CUs and the results are robust. In spectrum sensing, to avoid deriving the threshold, the fast K-medoids and Mean-shift clustering algorithms are adopted for obtaining a classifier to achieve spectrum sensing.
In simulation, the result shows that the proposed CSS methods are robust when MUs attack the CSS system.