Research and Experiment of Radar Signal Support Vector Clustering Sorting Based on Feature Extraction and Feature Selection

The result of radar signal sorting directly affects the performance of electronic reconnaissance equipment. Sorting method based on intra-pulse features has become a research focus in recent years. However, as the number of extracted features increases, the dimension of the feature vector becomes higher and higher. And too many dimensional feature vectors would make the complexity of the sorting algorithm increase geometrically. In this way, feature selection becomes more and more necessary. Combining the latest research on fuzzy rough sets, this paper proposes two feature selection methods, namely two-steps attribute reduction based on fuzzy dependency (TARFD) algorithm and fuzzy rough artiﬁcial bee colony (FRABC) algorithm. The TARFD method uses the candidate attribute set as starting point, according to the deﬁnition of the redundant attribute set. Then the less important attributes are successively eliminated. The FRABC method starts from the dependence degree of fuzzy rough set, and constructs a ﬁtness function that reﬂects the importance of the attribute subset and the reduction rate. Based on this function, the artiﬁcial bee colony algorithm is used to reduce the attributes of the dataset. Using the TARFD and FRABC algorithms, the extracted feature sets, including entropy feature set, Zernike moment feature set, pseudo Zernike feature set, gray level co-occurrence matrix (GLCM) feature set, and Hu-invariant moment feature set are processed, then an optimal feature subset was obtained and a sorting test was performed. The results show the effectiveness of the extracted intra-pulse features and the efﬁciency of the feature selection algorithm.


I. INTRODUCTION
At present, with the widespread application of complex radars, the modern electronic countermeasure environment is becoming more and more complex. The signal density of the electromagnetic threat environment has reached millions of orders [1], which mean that the pulse streams from the same direction are also densely complex. In this case, the performance is greatly depressed when only the RF and PW The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Zia Ur Rahman . parameters are used for sorting. In view of this problem, it is necessary to extract new features for characterizing the intra-pulse data of the radiation source signal, that is, to augment the conventional parameter set, and to describe the intra-pulse modulation difference of the radar signal with new features to solve the sorting problem caused by complex radar system and the diverse waveform. Radar signal sorting is difficult with conventional parameters because these parameters are usually from the single direction and can be changed in each parameter domain, and even overlap. In recent years, many new intra-pulse feature extraction methods have been proposed, such as wavelet analysis [2], [3], atomic decomposition [4], time-frequency analysis [5]- [8], and high-order statistical analysis methods [9]- [11], for example, extracts entropy feature set [11], Zernike moment feature set [12], pseudo Zernike feature set [13], [14], gray level co-occurrence matrix (GLCM) feature set [15] and so on.
However, not all the extracted intra-pulse features contain necessary information for classifying and sorting radar signals. Too many features may not only depress the sorting effect, but also may reduce the sorting efficiency. Therefore, in view of changing rule, distribution form, anti-noise performance and other aspects of the effectiveness of each feature, we need to study the feature selection methods, that is, select a set of reliable feature subsets to reduce the intra-pulse feature dimension effectively and maximize the classification effect meanwhile [16]. In recent years, the feature selection (attribute reduction) based on fuzzy rough set (FRS) has received much attention, and has achieved many research results [17]- [20]. These methods mainly focus on the use of distinguishing matrix, dependency and reduction methods with optimization algorithms.
In this paper, the theoretical and experimental analysis of the above problems is carried out, that is, the sorting method based on intra-pulse features is studied, and the minimum intra-pulse feature subset which could better characterize the radar signal is obtained, and the intra-pulse feature subset is utilized. The sorting experiment verifies the effectiveness of the extracted features and feature selection methods with the proposed sorting method of modification cone mapping support vector clustering (MCMSVC) with similitude entropy (SE) index (called SE-MSVC) in [21].

II. FEATURE EXTRACTION AND SIGNAL SORTING BASED ON FEATURE SET
Rough set theory is a mathematical tool capable of handling inaccurate and incomplete information. Let I = (U , A ∪ Q) be an information system, where U represents a non-empty finite set of objects, namely the domain. And A represents a non-empty finite set of conditional attributes. Unless otherwise specified, attributes refer to conditional attributes. In the information system, Q represents a non-empty finite set of decision attributes, that is, a category attribute set. The attribute set P ⊆ A corresponds to an indistinguishable equivalent relationship. Without ambiguity, this equivalent relationship is abbreviated as P, and the following research is carried out on the premise of this information system.

A. FEATURE SELECTION METHOD 1) FUZZY DEPENDENCY ANALYSIS
In the domain U , objects x may belong to more than one fuzzy equivalent class determined by fuzzy attributes. Therefore, the fuzzy equivalent class determined by multiple fuzzy attributes, that is, the Cartesian product of fuzzy equivalent classes determined by a single fuzzy attribute.
For the distinct rough sets, U /P determines a cluster set consisting of different sets composed of all indistinguishable objects. Supposing that the attribute set P contains m attributes {a 1 , a 2 , . . . , a m }, for the sets B and C, the operator ⊗ is defined as: (1) Then U /P can be calculated by: Consider the situation at m = 2: This means: The sets X a 1 1 , X a 1 2 , . . . and X a 2 1 , X a 2 2 , . . . represent the divisions of U determined by the attributes {a 1 } and {a 2 } respectively. In the distinct rough set, distinct set is where ∀X i ∈ X , and the membership function is used to represent the relationship with x ∈ U and X i , then: When the continuous attribute interval is divided according to actual needs, in order to determine the fuzzy equivalent class of each fuzzy attribute, each sub-interval divided by the fuzzy condition attribute corresponds to a fuzzy equivalent class of the attribute.
In the fuzzy rough set, a single attribute {a i } still corresponds to the equivalent relationship of an indistinguishable relationship. It is still briefly described as {a i }. The calculation of U /P in the fuzzy rough set is similar to the distinct rough set. The difference is that the sets X a 1 1 , X a 1 2 , . . . and X a 2 1 , X a 2 2 , . . . represent a fuzzy division of U determined by attributes {a 1 } and {a 2 } respectively.
The symbol X a i j represents the jth interval obtained by segmenting attribute a i by some attribute interval segmentation algorithm. All objects in this interval are indistinguishable. To distinguish them from distinct rough sets, let F a i j = X a i j . Let the fuzzy division of U determined by P is U /P = {F 1 , F 2 , . . . , F n }, since ∀F i ∈ U /P are all fuzzy sets, the degree to which object x k belongs to F i can be obtained by calculating the degree to which object x k belongs to the intersection of all fuzzy sets forming F i .
Considering the situation at m = 2 as well, let's set F i = F a g r ∩ F a h t , where g, h, r, t = 1 or 2, and use the membership function to represent the relationship between x and F i , which is: If m > 2, then the relationship between x and F i is similar to the membership function obtained at m = 2.
The distinct positive field in classical rough set theory is defined as the lower approximation union set. This concept is extended to the fuzzy positive field to obtain the membership degree of the object x with respect to the fuzzy positive field: According to the meaning of membership and the definition of fuzzy positive domain, the dependence of decision attribute Q on conditional attribute set P under fuzzy rough set conditions can be found [17]: where |U | represents the potential of U , that is, the number of objects in U . Similar to the classic rough set dependency, the above formula represents the proportion of distinguishable objects in the entire data set when only the information in P is used. According to the above formula, the potential of µ POS P(Q) is divided by the potential of the universe U . The above formula also indicates the degree to which Q depends on P. P determines the value of Q in proportion k , that is, P determines the classification effect. The larger the value k , the better the effect obtained when classifying signals based on attribute P.
The dependency described by formula (8) can be generalized to the dependency between attributes, taking P and Q as a single attribute set, that is, P = a i , Q = a j , ∀a i , a j ∈ A, then: Equation (9) characterizes the degree to which the attribute a j depends on the attribute a i . And a i determines the value of a j in proportion. The larger the value k , the greater the ability of a i to determine the value of a j .
The following description uses a simple decision table as an example to explain the calculation of the fuzzy dependency. Supposing the original decision table is shown in Table 1, where a 1 , a 2 and a 3 denote the conditional attributes, and Q denote the decision attributes. Firstly, the fuzzy lower approximation of a 1 , a 2 , a 3 are calculated, and then the dependence degrees of each attribute by formula (9) are calculated. The attribute values in this table are continuous. Under the fuzzy similarity relationship, it is still required to divide the continuous attribute interval to determine the fuzzy equivalent class of each fuzzy attribute. Each sub-interval divided by the fuzzy condition attribute corresponds to a fuzzy equivalence class of the attribute. The membership function of the equivalence class generally includes a trapezoidal function, a trigonometric function, and a normal distribution function. After the decision table 1 is processed, the interval division results are obtained: It can be seen that in each two-dimensional Cartesian sub-interval determined by the above results, the objects have the same decision value. So, each condition attribute is divided into two fuzzy equivalence classes.
For convenience, the fuzzy subset F a 1 1 of a 1 is treated the same as its membership function µ F a 1 1 , and the membership degree of element x with respect to F a 1 1 is represented by µ F a 1 1 . For other fuzzy subsets, they are processed in the same way. Figure 1 shows the membership function corresponding to each fuzzy equivalent class of the attribute determined by the trigonometric function and the trapezoidal function. The membership functions of fuzzy equivalent classes F a 1 1 , F a 1 2 of attribute a 1 are defined as: This gives: The attribute reduction algorithm based on fuzzy rough set dependency includes two steps. The first one is to determine the candidate attribute set, and the second one is to determine whether there are redundant attributes in the candidate attribute set. Based on the above analysis, this paper gives two definitions. The attribute reduction based on fuzzy dependency is mainly carried out by these two definitions.

2) REDUCTION METHOD 1
Definition 1: Supposing a j ∈ A, for a i , it satisfies: a i = max{a j |γ a j (Q), 1, 2. . . m }. For a given threshold T h , when calculating γ a j (Q) (j = i), if γ a j (Q) > T h , then a j is called as the first candidate attribute. However, if γ a j (Q) ≤ T h and , then a j is called a second candidate attribute. The set consisting of the first and second candidate attributes is called a candidate attribute set.
Definition 2: Let the determined candidate attribute set be C, supposing a i ∈ C, for ∀a j ∈ C (i = j), if γ a i ,a j = 1, γ a j ,a i = 0 or γ a i ,a j ≈ 1, γ a j ,a i ≈ 0, then the attribute a j is called the first redundant attribute, otherwise a j is called a dependent attribute. For the dependent attribute a j , if γ a j (Q) ≤ γ a i (Q) and γ a j ,a i > γ a i ,a j , then a j is a nonredundant attribute. And if γ a j (Q) ≤ γ a i (Q) and γ a j ,a i ≤ γ a i ,a j , then a j is called the second redundant attribute. The set of the first and second redundant attributes is called a redundant attribute set.
Based on the above analysis, a two-steps attribute reduction based on fuzzy dependency (TARFD) algorithm is proposed. The algorithm starts with the candidate attribute set and deletes the less important attributes one by one according to the redundant attributes obtained in Definition 2 until the termination condition is met. From this we can see that the feature reduction algorithm of TARFD belongs to filter method.

Algorithm TARFD
Step 1 Let T h = 0, and determine the candidate attribute set C by definition 1. For attribute a i ∈ C, calculate the dependency γ a i (Q) of decision attribute Q on a i ; Step 2 Sort {γ a i (Q)|i = 1, 2. . . m, m = |C|} in descending order to get a set of attributes, denoted as T , T = {γ a tl (Q)|γ a t1 (Q) ≥ γ a t2 (Q) · · · ≥ γ a tl (Q) · · · ≥ γ a tm (Q)}, where {t1, t2, . . . tl. . . , tm} is one arrangement of i; Step 3 Let k = 1; Step 4 Let j = 1, if j ≥ m−k, terminate the algorithm and go to step 7; Step 5 Calculate the dependency between attributes. If γ a tj (a t(j+k) ) ≥ γ a t(j+k) (a tj ), remove the attribute at a t(j+k) , that is, T = T \ {a t(j+k) }; Step 6 Let j = j + 1, if j < m − k, go to step 5, otherwise let k = k + 1, go to step 4; Step 7 Finally the T is a relative reduction of conditional attribute P relative to Q.

3) COMPLEXITY ANALYSIS OF TARFD
As we known, Jensen proposed a quick reduction algorithm for attribute reduction based on fuzzy rough dependency, and achieved good reduction results with a small dataset dimension [17]. However, the computational complexity of the algorithm increases sharply with the increase of the condition attribute set. Assuming that the attribute set P contains m attributes, and the partition of U determined by each attribute includes n fuzzy classes, then the partition of U determined by attribute set P includes n m fuzzy classes. The approximate computational complexity of the dependency of decision attribute set P can be expressed as O((n m n)) = O(N (n m+1 )) in the worst case, which means the algorithm does not terminate until the last attribute is calculated.
For example, suppose P = {a, b}, and the partition determined by each attribute contains 3 fuzzy classes, then the dependency of Q with respect to P needs to calculate 9 fuzzy classes. In fact, it is normal for the number of attribute sets to be greater than 10. Therefore, if we set |P| = 15, It is needed to calculate 3 15 fuzzy classes for the dependency of Q with respect to P. It can be seen that the computational complexity of this algorithm is quite large.
For the TARFD algorithm, it only needs to calculate the single attribute dependency in the attribute set and the dependency of any two attribute combinations. That is, the complexity of the dependency of the attribute set P in the worst case can be expressed as O( . If |P| = 15 is set as mentioned above, then it needs to calculate 1935 fuzzy classes. So, compared with quick reduction algorithm, TARFD algorithm improves the execution efficiency by 3 15 /1935 = 7415 times in the worst case, indicating that the algorithm is efficient.

4) ANOTHER CALCULATION OF FUZZY ROUGH APPROXIMATION
Let (U , P) be the fuzzy approximation space, that is, P is the fuzzy equivalence relation on the universe U . Assuming I represents the edge implication operator and T represents the t-module, then (I , T )-fuzzy rough approximation on the fuzzy approximation space (U , P) represents such a mapping: For ∀X ∈ F(U ), Apr I ,T (X ) = P I X ,P T X is holded. Where F(U ) represents the whole of the fuzzy subset on U . The fuzzy set P I X andP T X are respectively called the I -low fuzzy rough approximation and the T -up fuzzy rough approximation for X in the fuzzy approximation space (U , P), and their membership functions are determined by equations (14) and (15) respectively: For ∀y ∈ U , if y ∈ X , then µ X (y) is 1, otherwise it is 0. µ P (x, y) indicates the similarity between the objects x and y determined by the fuzzy relationship P. The implication operator and t-module are taken respectively: Based on the concept of fuzzy equivalence relation P, the fuzzy equivalence class of an object can be defined.
Then µ P (x, y) can be determined by the iterative formula (18): For attribute a ∈ P, µ a (x, y) could be determined by the fuzzy similarity relationship, which is expressed by formula (20), (21) and (22): In the formula, a(x) and a(y) represent the values of objects x and y on feature a respectively, and σ a represents the standard deviation of the values of all objects on feature a. After many experiments, the equation (22) is used to calculate the fuzzy similarity relationship between the objects.

5) ARTIFICIAL BEE COLONY ALGORITHM
The artificial bee colony (ABC) algorithm is an intelligent optimization algorithm for the actual parameter clusters [22], [23]. This algorithm is based on the foraging behavior of bee swarms. In the bee swarms, the employed, onlookers and scouts could maximize the nectar through different functions. Based on this principle, the optimal solution of the objective function can be solved. Compared with ant colony algorithm (ACA) [24], [25], ABC algorithm has faster convergence speed and higher estimation accuracy. In the ABC algorithm, the position of the solution in the optimization problem is equivalent to the food source, and its value is determined by the fitness value. To indicate that the initial solution is equal to the number of leading bees or following bees, of which the leading bee and the following bees each account for 50% of the colony, and the detection bee is converted by the leading bee after abandoning a food source. The main search process is as follows: Firstly, randomly generate N initialization solutions, each solution x i is composed of a d-dimensional vector, denoted as where d represents the number of parameters included in the solution for the optimization problem, i = 1, 2, . . ., N ; Secondly, set the maximum number of cycles of the ABC algorithm C max , and set the maximum searchable number of feasible solutions at a location t. If the updated solution at this position still cannot improve the fitness value after searching t times, then the feasible solution at this position would be discarded. Let the number of leading bees and following bees in each cycle be N , the number of cycles C = 1, start the cycle according to the following steps: Step 1 Let i = 1, and perform a search for lead bee E i according to step 2 until i = N ; Step 2 Correct the food source solution x i according to the randomly updated solution v i . If v i improves the fitness value, replace the original solution with v i , and also record the replaced solution as x i ; otherwise, record the position of solution x i which is one of the N initialized solutions, and the solution at this position is called candidate drop solution. v i is determined by (23): where x ij represents the jth parameter randomly selected in the solution x i , v ij represents the parameter of the candidate solution v i generated by modifying x ij , x kj represents the jth parameter in the randomly selected solution x k , k is in the neighborhood of i, and k = i, ij is a random number related to x ij , ij ∈[−1,1].
Step 3 Calculate the probability p i of the following bee selecting the food source x i . The larger the value of p i is, the better the fitness value could be obtained by x i , and the greater the probability of being selected by the following bee is. p i is determined by (24): where fit i represents the fitness value proportional to the amount of nectar of the food source x i , that is, the value of the objective function, it is meaning the fitness value of the ith solution. In the formula, N denotes the number of solutions.
Step 4 Let j= 1, remember that the food source selected by the follower O j is x i , and perform a search on O j according to step 2 until j = N , where i ∈{1, 2, . . . , N }; Step 5 The detection bee compares the relationship between the number of records h i and t of each candidate discarded solution in step 2. If h i > t, the solution would be discarded, also let h i = 0 and a new random feasible solution is generated according to formula (25): where r ij represents the jth replacement parameter in the solution x i , x max i and x min i represent the maximum and minimum values of x i , respectively, and φ ij denotes the random number associated to r ij , φ ij ∈[0,1], i ∈{1, 2, . . . , N }, j = 1, 2, . . ., N .
Step 6 Record the optimal solution at this time; Step 7 Determine whether the algorithm has reached the maximum number of cycles or meets the termination conditions. If any of the conditions are met, the algorithm is terminated, otherwise C = C+ 1 and the cycle is restarted from step 1.
Finally, when the algorithm terminates, the solution obtained in step 6 is the global optimal solution.

6) FITNESS FUNCTION DESIGN
The fitness function is the most important element in the ABC algorithm. A food source, that is, the fitness value of the solution can be understood as the degree to which the solution is close to the objective function. For function optimization problems, the fitness function is the objective function. For attribute reduction problem, there is no ready-made objective function available, so it is necessary to design a fitness function suitable for the problem.
The purpose of attribute reduction is to find the smallest set of conditional attribute whose dependency is equivalent to the original conditional attribute set, which means the maximum dependence and the smallest number of attributes are required. Therefore, the increase of the dependency in the fitness function or the decrease of the number of attributes should be beneficial to the increase of the overall fitness. In order to make the contributions to the fitness function of the dependency of the attribute set and the number of attributes in the solution equal, it is necessary to constrain the contributions of these two conditions to the fitness to the same order of magnitude. The fitness function in the design attribute reduction problem is as follows: where where fit i represents the fitness value of the attribute subset P i determined by the leading bee E i , f i represents the target value determined by the subset P i , and according to P i = ∅, it can be known that 0< f i <1. In formula (27) mentioned above, (|A| − |P i |) |A| represents the reduction rate, and γ P i (Q) γ A (Q) represents the normalization dependency of decision attribute Q to the subset P i . And where α represents the weight factor, is used to adjust the flexibility of the fitness function. Equation (27) shows that reduction rate and normalized dependency are positively related to f i , that is, as the reduction rate and the normalized dependency increase, the f i increases. However, there is not a positive correlation between the reduction rate and the normalized dependency. So how to obtain the maximum dependency and the minimum number of attributes on the premise of maximizing the target value is exactly the problem that this article is to solve.

7) REDUCTION METHOD 2
Based on the above analysis, this paper uses the ABC algorithm to complete the attribute reduction of the dataset. This method is based on the bee colony randomly selecting a subset of attributes. By calculating its fitness function through fuzzy rough sets, finally the best subset could be found during the iteration process. It is called the fuzzy rough artificial bee colony (FRABC) algorithm. By its processing we can see that the feature reduction algorithm of FRABC belongs to filter method. The specific steps of the algorithm are as follows.
Step 1 The hired bee randomly generates a subset of attributes. Supposing the attribute set P contains d attribute values {a 1 , a 2 ,. . . , a d }, the number of hired bees is N , and the number of randomly generated food source is N , and also the food source is considered as the initial solutions. x i is composed of a d-dimensional vector, that is, , d represents the number of attributes included in the attribute set P, and i = 1, 2, . . ., N .
Step 2 Perform the operations of rounding and merging the same items for the generated solutions. For example, if d = 8, the randomly generated solutions are: {5.75, 6.30, 6.20, 3.75, 5.59, 2.20, 5.94, 1.22}. The same items are combined into: {6,4,2,1}, which represents the 6th, 4th, 2nd, and 1st attributes in the attribute set P are selected to form a subset of attributes.
Step 3 Calculate the dependency of the N attribute subsets on the decision attributes according to the formula (8), that is, the importance of the attribute subsets, so as to obtain the fitness function of the ABC algorithm.
Step 4 Run the ABC algorithm based on the fitness function and N subsets of attributes.
The global optimal solution is the reduced attribute subset of the data set.

B. FEATURE EXTRACTION METHOD
In paper [15], [26] and [27], we extracted several feature sets, including gray level co-occurrence matrix (GLCM) feature set, entropy feature set, Zernike moment feature set and pseudo Zernike feature set, they are introduced as follows.
For the entropy feature set, the sample entropy S e and fuzzy entropy F e are extracted as follows [26]: (1) Pre-processing the intra-pulse data of the radiation source signal to reduce the entropy features affected by the carrier frequency and noise; (2) To avoid the influence of the difference in signal length on the entropy feature extraction, the signal is normalized and resampled; (3) Determine the parameters used in estimating the sample entropy S e , that is, take the scale parameter m = 2 and the tolerance parameter r = 0.2 √ 2 · σ ; (4) Obtain the optimal parameters a, b, c of the fuzzy entropy F e and the value of the sum by obtaining the entropy maximum of the fuzzy set; (5) Calculate S e and F e use the sample entropy estimation algorithm and the fuzzy entropy formula, respectively.
The normalized energy entropy features P e can be extracted as follows [27]: (1) Endpoint extension of the data sequence according to the mirror closure extension method and resampling; (2) Determine an EMD component of the signal sequence according to a convergence criterion; (3) Find the normalized energy of each IMF component according to the formula; (4) Calculate the normalized energy entropy P e of the radiation source signal using the Shannon formula.
Finally, the entropy feature set F E = [S e , F e , P e ] is obtained.
The bispectral quadratic feature extraction algorithm is mainly performed as follows [15]: (1) Resample the intra-pulse data to maintain the same length of each signal; (2) Use the bispectrum estimation direct method to solve the bispectrum of the intra-pulse sequence of the radiation source signal; (3) Use the symmetry of the real-signal bispectrum, determine the following regions containing all the information of the bispectrum: In the condition, w 1 and w 2 represent the frequency axis of the bispectrum, the subsequent algorithm of the bispectral quadratic feature extraction is performed in this region; (4) Convert the bispectrum in the above region into a grayscale image; (5) Calculate the feature sets, including Zernike moment feature set F [12] Z , pseudo Zernike feature set F [16] P , gray level co-occurrence matrix (GLCM) feature set [15], and Hu-Invariant moment feature set F ϕ for comparison, respectively, based on the calculation formulas of Zernike moments, pseudo-Zernike moments, gray-scale co-occurrence matrices and Hu-invariant moments.
It is worth noting that when calculating the features of the gray level co-occurrence matrix, the co-occurrence matrix of the four main directions is first obtained, and then the smoothing process is performed to extract the features, and the step size is taken as 1.
Before the sorting experiment, the data set is needed to be pre-processed, that is, the attributes of the data set would be blurred. For the sake of brevity, the standard ambiguity technique is used to obtain the TARFD fuzzy equivalent class determined by a single attribute, and the trapezoidal function and the trigonometric function are used to determine the membership of the objects in each data set with respect to the single attribute. Thus the similarity between objects of a single attribute are obtained, and then the membership degree of multiple fuzzy attributes in the TARFD and FRABC algorithms are determined by formulas (6) and (18), respectively. Based on the above analysis, the block diagram of radar signal sorting method based on feature extraction and feature selection is shown in Figure 2.
In order to verify the effectiveness of the extracted features and the algorithms of TARFD and FRABC, 6 typical radar signals widely used in international literature are selected for simulation experiments. These signals are: conventional wave radar signals (CW), linear frequency modulated radar signals (LFM), nonlinear frequency-modulated radar signals (NLFM), two-phase coded radar signals (BPSK), four-phase coded radar signals (QPSK), and frequency-coded radar signals (FSK). The signal carrier frequency is 850MHz, and its pulse width and sampling frequency are 10.8µs and 2.4GHz respectively. The frequency offset of the linear frequency modulated radar signal (LFM) is set to 45MHz. The frequency of the non-linear frequency modulated radar signal (NLFM) is sinusoidally modulated. The coded radar signal (BPSK) uses a 31-bit pseudorandom code, the four-phase coded radar signal (QPSK) uses a Huffman code, and the FSK signal uses a Barker code. For these 6 type radar signals, 100 samples were generated within a typical SNR range (SNR = 15dB), for a total of 600 samples. The 200 samples generated were randomly selected for feature extraction and used for classifier training, and the remaining 400 samples were used for feature extraction and used as a test set for signal classification. The sample distribution is shown in Table 2. The experiments were performed on a Pentium (R) dualcore E5300 personal computer. The computer configuration is: CPU frequency is 2.6GHz, memory is 2GB, and hard disk is 250GB.
In order to ensure the data is comparable, each dimension feature sequence needs to be normalized. Here, it is processed according to the interval value method represented by formula (29).
In view of the fact that the radar radiation source training database is usually not complete and applied in the actual 93328 VOLUME 8, 2020 battlefield environment, it is necessary to adopt a classifier that is suitable for training with fewer training samples, and has a faster training and classification speed, and is not easy to fall into local minimums. The support vector machine (SVM) has these advantages. Therefore, the SVM is selected here as a learning algorithm for classification and recognition. The SVM can not only solve many problems existing in previous machine learning methods, such as practical problems of nonlinearity, small samples, high dimensions, local minimum points, over-learning, etc. Compared with the neural network learning (NNL) algorithm based on the principle of empirical risk minimization (ERM), the support vector machine (SVM) has better generalization ability and stronger theoretical basis. The SVM uses a kernel function to complete the mapping of the input feature vector from a low-dimensional feature space to a high-dimensional feature space. With this method, the nonlinear separable problem of the original space can be transformed into the linear separable problem in a high-dimensional space. Through this process, the purpose of classification and identification of the radiation source signal can be achieved.
The comparison of classification accuracy is shown in Table 3. In Table 3, the AN represents the number of condition attributes, the Acc represents the classification accuracy, and the ET represents the execution time of reduction method. The results in Table 3 show that the TARFD algorithm can obtain the smallest average feature subset without significantly reducing the average classification accuracy rate. The average number of features in the subset is 2.2. For the FRABC algorithm, the average number is 2.6. In addition, compared with the TARFD reduction algorithm, the FRABC algorithm has a higher average accuracy rate, reaching 87.20%. For the execution time of the algorithms, the average execution time of the TARFD algorithm is 9.293s and of the FRABC algorithm is 10.055s. Compared with the FRABC algorithm, the TARFD is more efficient.
In addition, from the comparison of the classification accuracy of different intra-pulse feature subsets in Table 3, it can be seen that the classification results of the reduced subsets obtained by the TARFD and FRABC algorithms are close to the classification results before the reduction, indicating that the feature subsets selected by these two algorithms can obtain most of the classification information of the original feature set. Therefore, the subsets of these feature sets are used for cluster sorting in the following, and the relationship between the accuracy rate and the classification precision is comprehensively considered.
The FRABC algorithm obtains the following feature subsets to characterize the intra-pulse information of the radiation source signal: sample entropy, fuzzy entropy and normalized energy entropy features in the entropy feature, that is, F E = [S e , F e , P e ], Zernike moment feature subset F Z = [Z 42 , Z 62 ], and pseudo Zernike moment feature subset F P = [P 21 , P 53 , P 54 ], the bi-spectrum smooth gray level co-occurrence matrix feature subset F G = [f 4 , f 14 ] and the first-order and second-order moments F ϕ = [ϕ 1 , ϕ 2 ] of Hu-invariant moments, of which F ϕ is still used for the experimental comparison later.
After extracting the intra-pulse features by the above method and pre-treating, the feature distribution of each feature subset of the experiment is shown in Figure 3.

III. SIGNAL SORTING BASED ON FEATURE SET A. RADIATION SOURCE SIGNAL PARAMETER SETTING
Assume that the parameters of these 6 radar emitter signals used in the paper are shown in Table 4, and the pulse flows are assumed to be in the same direction. In the intra-pulse modulation mode, the frequency offset of the LFM requires B · τ keeping at 100MHz µs, where B is bandwidth and τ is the pulse width; BPSK and FSK use 13 bit Barker code, QPSK uses 16 bit Frank code, and NLFM bandwidth is 6∼10MHz, using sine frequency modulation.
It can be seen from Table 4 that the 6 radar signals to be sorted have complex and variable features in addition to the inter-pulse parameters with severe overlap.
Let the receiver have an intermediate frequency of 30 MHz, a bandwidth of 20 MHz, and an ADC sampling frequency of 150 MHz. The TOA of the starting pulse of a radiation source with a uniformly distributed random number from 0 to the total interception time T int is simulated. When the total intercept time is T int = 50ms, a typical experiment (SNR = 15dB) produces a total of 476 pulses for the 6 type radar signals in Table 4. The details of the generated pulses and residual pulses are shown in Table 5. It should be pointed out that in the experiment, each pulse contains many intermediate frequency (IF) sampling data. Feature extraction is performed on these sampling data of each pulse. According to different algorithms, feature sets of different dimensions are obtained. For example, for a pulse of Rd1 with a pulse width of 15µs and a sampling frequency of 150 MHz, the number of data obtained after this pulse is sampled is 2250. Entropy feature extraction is performed on these data to obtain a three-dimensional feature vector F E = [S e , F e , P e ]. Further, for the 152 pulses generated by Rd1, the total data generated is 152 × 2250. After entropy feature extraction is performed, the resulting feature matrix row × column is 152 × 3, that is, there are 152 samples in Rd1 in total. Each sample generates three entropy features and composes the entropy feature vector F E . It's all like this for Rd2, Rd3, Rd4, Rd5, and Rd6.

B. EXPERIMENT OF SIGNAL SORTING
The following 5 kinds of feature subsets are used for representing signal sorting, and the MCMSVC algorithm is still used for clustering, and the SE index is used to adjust the clustering parameters. The initial value is set C = 1 during simulation, and the parameter adjustment of q is similar to the step of adjusting with the inter-pulse parameter sorting according to formula q = 1/ max ij g i − g j 2 . In addition, as can be seen from Table 5, the actual pulse numbers of the radiation sources Rd1-Rd6 for sorting are 138, 54, 59, 101, 29 and 51, respectively, and the radiation source signals represented by each feature subset are classified by SE-MSVC. The results are shown in Table 6. The results in the table are the statistical average of 20 sorting results, n i indicating the actual number of pulses of ith radiation source signal, NCSP means number of correct sorting pulses, NMP means number of misselected pulses and TSA means total sorting accuracy.
The following conclusions can be drawn from Figure 3 and Table 6: (1) The distribution of feature subsets shown in Figure 3(b) shows that the three-dimensional entropy feature of Rd1, Rd2 and Rd5 with CW, LFM and NLFM modulation modes is better, so when using entropy feature sorting, the three types of radiation source signals have fewer misselected pulses, especially the radiation source Rd2 is selected correctly, while Rd1 and Rd5 also have only 1 and 4 misselected pulses respectively. Relatively speaking, Rd3, Rd4 and Rd6 misselected more pulses. The result is determined by the modulation mode that the signal itself has. The analysis is corresponding to the items F E listed in Table 6.
(2) When the sorting method based on the bispectral quadratic feature, that is, using the feature subset F P , F Z , F G andF ϕ for sorting, since the bispectral gray image determines that each feature subset exhibits a characteristic distribution with a certain trend, the sorting accuracy rate is lower than the characteristic sorting result with 87.50% correct rate using F E , as shown in Table 6. However, the average sorting accuracy rate of more than 80% is also the same as the sorting result obtained by using the inter-pulse parameters, which means this method is effective. In addition, the results in the table also show the sorting effect of several feature subsets F P , F Z , F G are better than that obtained when the subset F ϕ is used.
(3) Comparing the results obtained by using the inter-pulse parameters in [28], it can be seen that both Rd2 and Rd5 have more misselected pulses. In addition to the corresponding results, Rd5 has more misselected pulses using F Z . And the remaining feature subsets are significantly less misselected pulses than the number of Rd2 and Rd5 caused by the inter-pulse parameter sorting, indicating that the feature subsets obtained by these feature extraction methods effectively compensate for the disadvantage that some signals cannot be sorted efficiently with the inter-pulse parameters. At the same time, this point can be explained by comparing the distribution of the parameters of the inter-pulse parameters in [28]. It can be seen from [28] that the RF-PW feature distribution of the radiation source Rd1 and Rd2, Rd5 and Rd6 overlaps severely, so that the SE-MSVC clustering method with excellent sorting performance can not achieve satisfactory results. Among them, Rd2 and Rd5 have more misselected pulses. In general, the extracted features of the intra-pulse data supplement the new features for sorting except for the conventional parameters. Thus the characteristic representation of the signal is performed from a new view, and a better sorting effect could be obtained.
Although the Hu-invariant moment feature set and subset are not as good as other feature sets in classification and clustering, considering the effectiveness of the feature set in characterizing the bispectral information of the radiation source signal, the feature set is still added to the total feature vector for clustering experiments.
The following considers the comprehensive use of these several intra-pulse features for signal sorting, that is, the SE-MSVC sorting of the interleaved pulse stream is completed by using the feature vector G = [S e , F e , P e , ϕ 1 , ϕ 2 , f 4 , f 14 , Z 42 , Z 62 , P 21 , P 53 , P 54 ]. At this time, the pulse data shown in Table 6 is still taken, and the clustering sorting correctly rate using G is shown in Table 7. The result in the table is the statistical average of 20 sorting results. As can be seen from Table 7, at the same typical signalto-noise ratio, the correct rate of 91.67% is only about 4% higher than the average correct rate obtained by using the sorting parameter F E , Compared with the results obtained by using the feature subsets shown in Table 6, except for the Zernike moment feature subset F Z , the other sorting results are less than the number of Rd5 misselected pulses obtained by using G. The reason for this phenomenon is that there is an interaction relationship between the features included in the vector G, so that the sorting effect of some of the radiation sources is rather reduced.

IV. FEATURE SELECTION AND SIGNAL SORTING BASED ON FEATURE SUBSET
Considering the above factors and the relatively low computational complexity of the TARFD algorithm, the feature set G is selected again by TARFD. Under the typical signal-to-noise VOLUME 8, 2020 ratio, the experiment is performed according to the pulse data shown in Table 4, and the feature selection of the TARFD algorithm is adopted. The process is as follows.
According to the proposed algorithm of TARFD, first makes the decision attribute Q ∈ {1, 2, 3, 4, 5, 6}, for the convenience of the narrative, the candidate feature set is: a 2 , a 3 , a 4 , a 5 , a 6 , a 7 , a 8 , a 9 , a 10 , a 11 , a 12 ] (30) The membership is then calculated from the fuzzy function. Through a large number of experiments, it is found that the characteristic membership degree of each radiation source signal calculated by using trapezoidal function and trigonometric function is more favorable for feature selection. Therefore, the membership function corresponding to the condition attribute is calculated by using the membership function.
Then, according to formula (9), the dependence γ a i (Q) of each feature is calculated, and a i denotes the ith feature in the feature set G, i = 1, 2, . . . , 12. The obtained feature dependence is shown in Table 8. Once the threshold T h is determined, it is needed to judge according to the threshold whether the feature needs to be reduced in advance. In the experiment the T h is set to T h = 0, so the next step is directly performed.
Then calculate the degree of interdependence between features a i and a j according to formula (9) to obtain the interdependency relationship as shown in Table 9.
After obtaining the dependency degree between attributes, it is needed to judge whether the feature is redundant according to the condition: if γ a j (Q) ≤ γ a i (Q) and γ a j ,a i ≤ γ a i ,a j , the redundant feature a j is eliminated, wherein i, j = 1, 2, . . . , m, m = |G| , i = j, |G| represents the dimension of the feature set, and γ a i ,a j represents the interdependency of the ith row and j column in table 9.
After the redundant features are removed, a subset of the feature sets G is obtained. Table 10 lists the different levels of feature subsets obtained during the reduction process, where the last result refers to the reduced subset obtained from the TARFD algorithm to the termination condition.  Thus an optimal subset of the feature set is obtained: The results in Table 10 show that the Hu-invariant moment feature subset belongs to the redundant feature set, while the fuzzy entropy feature F e , the normalized energy entropy feature P e , the correlation feature of the bispectral gray level co-occurrence matrix f 4 , and the Zernike moment feature Z 42 with the repetition degree of 2 and 4 order, and the pseudo Zernike moment feature P 53 with the repetition degree of 3 and 5 order belong to the optimal feature determined by the algorithm.
Using the feature subset G for sorting, the results of 20 sorting statistical averages are shown in Table 11, also n i indicating the actual number of pulses of ith radiation source signal.
The results in Table 11 show that the results obtained by using subsets G are slightly lower than the results obtained by using subsets G, and the average correct rate differs by less than 1%.
Compared with the results in Table 7, the misselected pulses of Rd4 increase by using subsets G , but the misselected pulses of Rd5 are obviously less. This shows that the sorting results of the two feature sets are not much different, indicating that the method of sorting by using subsets G is effective. In addition, considering the feature dimension, the feature subset G only contains 5-dimensional features. Compared with the feature set G containing 12-dimensional feature vectors, the feature dimension is obviously reduced, and the sorting result with little difference from the feature set G is obtained. The result illustrates the effectiveness of the feature selection method proposed in the paper.

V. CONCLUSIONS
This paper studies and verifies the feature extraction and feature selection methods using for radar signal sorting. In the experiment, only the case of intra-pulse noise at typical signal-to-noise ratio (SNR = 15dB) is considered. The results show that with the proposed two-step attribute reduction (TARFD) method based on fuzzy dependence and the fuzzy rough artificial bee colony (FRABC) algorithm, the selected optimal subset of features can effectively sort the radar signals. At the same time, after reducing the dimension of the feature subset efficiently, the sorting result with little difference from the feature set G is obtained, and the correct rate is 90.97%. JILIANG CAI received the Ph.D. degree in electromagnetic field and microwave technology from Air Force Engineering University (AFEU), in 2013. He is currently a Faculty Member with Air Force Engineering University. His major researches include radar signal processing, deep learning, and radar engineering.
BINFENG ZONG received the bachelor's degree in radar engineering from Air Force Engineering University, Xi'an, China, in 2010, and the master's and Ph.D. degrees in electronic science and technology from Air Force Engineering University, Xi'an, China, in 2012 and 2016, respectively. He is currently an Instructor with the Air and Missile Defense College, Air Force Engineering University. His current research interests include aviation electronics, millimeter wave antennas and arrays, and MIMO radar.