Random Fuzzy Clustering Granular Hyperplane Classifier

Granular computing is a method of studying human intelligent information processing, which has advantage of knowledge discovery. In this paper, we convert a classification problem of sample space into a classification problem of fuzzy clustering granular space and propose a random fuzzy clustering granular hyperplane classifier (RFCGHC) from the perspective of granular computing. Most classifiers are only used to process numerical data, RFCGHC can process not only non-numerical data, such as information granules, but also numerical data. The classic granulation method is generally serial granulation, which has high time complexity. We design a parallel distributed granulation method to enhance efficiency. First, a clustering algorithm with adaptive cluster center number is proposed, where the ratio of standard deviation between categories and standard deviation within categories is as evaluation criterion. The clusters and the optimal amount of cluster centers can be achieved by the method. On the basis of these, sample set can be divided into many subsets and each sample can be granulated by these cluster centers. Then, a fuzzy clustering granular space can be formed, where fuzzy clustering granules, fuzzy clustering granular vectors, and their operators can be defined. In order to get the optimal hyperplane in the fuzzy clustering granular space to classify these samples, we design a loss function and evaluate each category with probability by fuzzy clustering granular hyperplane. In solving the loss function, genetic algorithm based on fuzzy clustering granules is adopted. Experimental results and theoretical analysis show that RFCGHC has good performance.


I. INTRODUCTION
Classification problem is one of the common problems faced by human beings in production and life. There are three important issues of machine learning: How to get reliable knowledge from data and experience, how to build a classifier, and how to assist people in completing the tasks of pattern recognition and data analysis. As machine learning technology gradually becomes an essential force to enhance the advancement of computer and automation field, it is of great value to study construction of classifiers based on machine learning. According to structural characteristics of classifiers, it can be divided into statistical learning The associate editor coordinating the review of this manuscript and approving it for publication was Bin Liu . classifiers, deep learning classifiers, etc. Here, the statistical learning classifier starts from probability statistics, constructs a mathematical model, and uses data to train and adjust the model to obtain a predictor. Deep learning classifiers specifically refer to classifier models that use deep neural networks as the main model. According to the learning approach, the classifier construction method can be summarized into unsupervised learning, and reinforcement learning, and supervised learning. When using a supervised learning to construct a classifier model, a given set of inputs is required to determine the ground truth. This input and output data are named as a training sample set. The learning system can adjust the model parameters by a certain optimization method according to the difference between the ground truth (label) and the predicted output. Common classifier models such as VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ statistical learning classification methods represented by support vector machines [1], decision trees [2], and deep learning classification methods represented by convolutional neural networks [3] can be obtained by supervised training. Supervised learning relies upon high quality training samples, and the number of training samples is positively related to model complexity and number of parameters. Currently widely used deep neural network classifiers have complex structures and huge parameters. Training an efficient and accurate deep neural network model often requires tens of thousands or even hundreds of thousands of training samples, and its labeling task is very heavy. When an unsupervised learning method is used to construct a classifier, the training of the model does not rely on external supervision signals. That is, it is the lack of input and output sample pairs for the current environment.
The learning system can transfer available knowledge and experience from other models and data, thereby unsupervised classification of current tasks can be done. Typical unsupervised classifier include domain adaptation [4], Zero Short Learning [5], etc. It is rare to use reinforcement learning to build a single classifier. This method is generally used to build an agent, which has multiple functions such as classification, perception, decision generation, and action output. In reinforcement learning, the external environment only gives an evaluation of the output results generated by the learning system, rather than real reference data. The learning system strengthens the rewarded actions and output to adjust parameters and structure [6]. In the same way, the learning system will suppress the actions that are punished and avoid incorrect outputs and decisions. Reinforcement learning is widely used in scenarios that require real-time response and output, such as the game of Go [7], etc. Common reinforcement learning methods include Q learning [8], strategy gradient learning [9], etc. The actual situation of the finite number of training samples greatly restricts construction and application of the classifier [10], [11]. From the 1980s to the beginning of this century, most of the statistical learning classifier were oriented to a small number of sample sets. On the one hand, this stems from the constraints of the development level of sensing technology and computer hardware.But then, it is often greatly difficult to acquire data in specific application fields. In the past ten years, information science and technology have developed rapidly, and massive amounts of complex data from different data sources with high-level semantic relevance have emerged in industry and life. In most cases, the data collected by different sensors for different tasks is complex and heterogeneous. In the face of specific application scenarios, low data quality is an extremely common phenomenon. On the one hand, this low quality lies in the relatively large amount of data and the extremely limited number of labels. On the other hand, it lies in the incompleteness or inaccuracy of the labels. With the substantial increase in sample feature dimensions, sample quality is currently one of the main problems that need to be faced when we constructing classifiers based on deep learning technology, and it is also a hot research issue that scholars are paying attention to.
There is advantage of knowledge discovery in Granular Computing. It is an information processing theory and method for studying human intelligence [12]. Its basic idea and concept is to describe and process uncertain information and obtain knowledge [13]. Granular computing uses the three-element model of granular, granular layer and granular structure to abstract, decompose, synthesize, transform and analyze real problems. Hence, it can form an information analysis, processing and problem solving method, which is similar to human thinking, cognition and reasoning. The ternary thought of granular computing points out a formal and systematic granular computing research framework from the perspectives of philosophy, methods and mechanisms, namely, structural thinking, structural problem solving, and structural information processing [14]. Granular computing can use three methods, such as inside-out, bottom-up, and top-down in a multi-level granular structure description space to decompose complex problems into several small problems, reduce complexity to simplicity, divide and conquer. We can find an approximate solution to the problem between granular structures, thereby giving a granular calculation method to deal with uncertain knowledge. Just as Bo Zhang and Ling Zhang, the founders of the quotient space of granular computing theory, pointed out, ''A recognized feature of human intelligence is the ability to observe and analyze the same problem from different granules. People can not only solve problems in the world of different granules, but also quickly jump from one granular world to another granular world, back and forth freely, without difficulty [15].'' It can be seen that how to describe, define, and construct the three main characteristics of causation, organization, and granulation of human cognition are difficult problems in the current field of granular computing research.
The thought of granular computing originated from the concept of fuzzy information granulation put forward by Zadeh, the father of American fuzzy sets [16]. He believed that human cognition has three main characteristics, namely granulation, organization, and causation [16]. Hobbs proposed concept ''Granularity'' in 1985 [18]. Pawlak introduced the problem of knowledge granularity in approximate space from rough set angle [19]. After that, Lin proposed concept ''Granular Computing'' based on binary relations [20]. Yager and his colleagues also discussed value of granular computing in intelligent engineering [21]. Yao studied construction and calculation of granules in granular computing in 2000 [23]. On the basis of discussing the basic principles and problems of granular computing, Yao presented a granular computing model on the basis of set theory in 2004 [24]. Four years later, he studied the past, present, and future development context of granular computing and proposed the ternary theory of granular computing [12], [25]. In 2009, he deeply explored the connection between granular computing and cognitive science, and built a framework of granular computing for cognitive concept learning [26]. In order to integrate concepts and methods in several different fields, Feiyue Wang and his colleagues proposed a computational theoretical framework for a language dynamic system based on word calculation [27]. Guoyin Wang and his colleagues proposed a granular computing model based on tolerant relations [28]. The granular computing model based on single granularity only seeks an approximate solution to the problem from a single angle. In recent years, researchers have shifted the focus of research to multiple granularity. Compared with single granularity, multi granularity integrates the relationship between different single granularity, so it has stronger expressive power, and problem solving based on multi granularity often has advantages in complexity and performance. Yuhua Qian and his colleagues presented a multi-granulation rough set model. This is extension of classic single granularity rough set theory. They gave definition of set approximation by multiple equivalence relations on the corpus in the model [29]. When actually solving complex problems, the multi-granularity rough set model can think of the comprehensive evaluation of multiple-granulation. Weizhi Wu and his teamwork proposed a multi-scale rough set model [30]. Feng Zhu et al. studied three different covering rough set models in the literature [31], and gave the approximate conversion relationship between the models [32]. Feng and his teamwork proposed a hierarchical rough set model in the literature [33]. Wei Li et al. studied a set of classification models from the angle of granular computing and achieved good results [34]- [36]. Granular computing has been widely used, such as using granular computing and reply state networks to build error diagnosis models [37], building neural networks from a granular perspective for classification [38], [39] etc.

II. CONTRIBUTIONS
This paper constructs a random fuzzy clustering granular hyperplane classifier from the perspective of granular computing. Main contributions are as follows: • Aiming at the problem of high time complexity in the granulation process, we propose a parallel distributed fuzzy clustering granulation approach. According to this principle, we designed a clustering algorithm with an adaptive amount of cluster centers. The algorithm is based on the ratio of standard deviation between clusters and standard deviation within clusters to optimize the number of clusters and cluster centers. On the basis of these optimal clusters, fuzzy granulation can be executed parallelly.
• We define the concepts on fuzzy clustering granule, fuzzy clustering granular vector, and fuzzy clustering granular hyperplane and design the operators between them. In fuzzy clustering granule space, we can find an optimal fuzzy clustering granular hyperplane to classify samples, of which parameters can be learned by optimizing the loss function by genetic algorithm. This approach can be used for task of binary classification or multiple classification.

Given a classification system
. . , y L } is a category set or label set, and y i is a label corresponding to x i (x i ∈ X , y i ∈ Y ), our aim is to construct a classification model, get parameters of model by training sample set, and predict category of test sample.

IV. THE MAIN ALGORITHM
To solve the classification problem introduced above, the model is designed as follows: First, to reduce the complexity of fuzzy granulation as much as possible and granulate parallelly, the data is clustered to get cluster centers and the cluster number. Second, data is fuzzy granulated by the cluster centers. In this process, fuzzy cluster granule, fuzzy cluster granular vector are defined and operators are also designed in the fuzzy cluster granular space. Next, fuzzy clustering hyperplane is defined and loss function is also designed.
To minimize the loss function, genetic algorithm is adopted in fuzzy clustering granular space. In other words, optimal fuzzy granular vectors can be learned by genetic algorithm. Finally, given a test sample, its category can be predicted by calculating the maximum probability of categories. The overview is shown in Figure 1.

A. PRINCIPLE OF ADAPTIVE RANDOM CLUSTERING
The classic K-means algorithm requires the amount of cluster centers in advance. This parameter is critical to the clustering effect. To avoid the abnormality of the final result caused by the sensitivity, we propose an adaptive random clustering algorithm. We know that if the standard deviation between clusters is large and the the standard deviation within clusters is small, then the clustering effect is good. Here, σ C denotes the standard deviation between clusters and σ k is the standard deviation within the k −th clusters. We employ σ 2 C / K k=1 σ 2 k as evaluation of clustering performance. The larger the value is, the better the clustering effect is. Our aim is to randomly adjust cluster centers and their number to make the ratio continue to increase until the maximum times of iterations is reached. In each iteration, we can get a cluster center set and merge cluster center and evaluation value into a set. When the maximum amount of iterations is reached, a series of cluster centers and evaluation values can be obtained. Then the maximum evaluation value and its corresponding to cluster center set can be selected, namely the parameters can be learned. The detailed stage is as follows.
Step 1. Delete samples with missing attribute values.
Step 2. Normalize attribute value of the sample.
Step 3. Initialize the maximum iterations as Max, set the evaluation set to an empty set EVSet ← φ composed of cluster center and the number of cluster center, and set the current iteration number t = 1. Step 4. Random generate the number of cluster center K ∈ ( √ N , N ), and random select K samples as cluster centers c 1 , c 2 , . . . , c K .
Step 5. Calculate the distance from each sample point to K cluster centers, find the cluster center closest to each sample point, and assign the sample to the closest cluster. Then recalculate the center of each cluster and set it as the new cluster center.
Step 6. Compare the sum of the standard deviations of the cluster with the threshold. Namely, if K k=1 σ k ≥ Threshold, go Step 5, otherwise go Step 7.
Step 10. If current iteration number t > Max, go Step 11, otherwise go Step 4.
Step 11. Select the optimal cluster center set corresponding to the largest ratio of the standard deviation between the clusters and the sum of the standard deviation within the cluster in EVSet and its number. That is, We have given the principle of adaptive random clustering mentioned above.

B. FUZZY GRANULATION OF DATA
In this section, we introduce how to achieve data fuzzy granulation with the help of cluster center sets. The fuzzy granulation can be executed parallelly distributed. Time complexity will be greatly reduced compared with the classical approach.
Given a cluster center set C = {c 1 , c 2 , . . . , c K }, an sample set X , and an attribute set A, the similarity between ∀x i ∈ X and ∀c j ∈ C can be defined: where f (a, Thus, a fuzzy clustering granule induced by x i can be defined: For simplicity, it can also be written as ''−'' denotes delimiter and ''+'' represents the union of elements. In other words, G a (x i ) is neighborhood set of x i in the cluster center set. The cardinal of Fuzzy clustering granule G a (x i ) can be calculated by For ∀x, x ∈ X , a set of operators of fuzzy clustering granules induced by x and x can be defined: where p is a parameter of metric and it satisfies p > 0. For ∀B ⊆ A and B = {a 1 , a 2 , . . . , a |B| }, |B| ≤ |A|, fuzzy clustering granluar vector induced by x on B can be defined: The above symbol ''+'' means union and the symbol ''−'' means separator. Its modulus can be expressed by The operators of fuzzy clustering granular vectors are given below. LetḠ B (x) andḠ B (x ) be fuzzy clustering granular vectors induced by x and x on attribute subset B. The operators between them are as follows: The similarity of fuzzy clustering granular vectors can be defined: From the above information granulation process, it can be drawn that the obtained information granules are all on the basis of the cluster center set and are fuzzy.The granular space composed of these fuzzy cluster granules is called the fuzzy cluster granular space. Theorem 1: For ∀x i , x j ∈ X , the similarity of fuzzy clustering granule satisfies to Proof: According to equation (3), As defined in equation (1), we can obtain 0 ≤ s a (c k , x i ) ≤ 1 and 0 ≤ s a (c k , x j ) ≤ 1. According to equation (4), is established. Then it is defined by fuzzy clustering granular vector (see equation (9)), we can get |B| * |C| is also established. We divide both sides by |B| * |C| and obtain 0 ≤ Theorem 2: (Monotonicity) For ∀x ∈ X , attribute subsets B, E satisfy to B ⊆ E ⊆ A. LetḠ B (x) andḠ E (x) be fuzzy clustering granular induced by x on B and E respectively.
(see equation (9)) and B ⊆ E ⊆ A, for a ∈ B, then we have a ∈ E and |B| ≤ |E|. Namely, if G a (x) ∈Ḡ B (x), we can get G a (x) ∈Ḡ E (x). To sum up, equation |Ḡ B (x)| ≤ |Ḡ E (x)| is established. This completes proof. Below we give an example to calculate the metric of fuzzy clustering granular vector. Example 1. As demonstrated in Table 1, given an sample set X = {x 1 , x 2 , x 3 , x 4 }, an attribute set A = {a 1 , a 2 , a 3 }, a label set Y = {1, 0}, a cluster center set C = {c 1 , c 2 }, and metric parameter p = 2, the fuzzy clustering granulation is as follows:  As defined in fuzzy clustering granule, we can obtain G a 1 (x 1 ) = 0.9 The fuzzy clustering granular vector induced by x 1 on A can be written asḠ A (x 1 a i . Similarly, the fuzzy cluster granular vector induced by x 2 on A can be written Similarly, we can obtain Therefore, under metric parameter p = 2, the similarity between fuzzy clustering granular vector induced by x 1 and one induced by x 2 can be calculated by

C. CONSTRUCTING FUZZY CLUSTERING GRANULAR HYPERPLANE
We first think of a binary classification problem, and then generalize it to a multi-classification problem. If it is a binary classification problem, then we can set the label set Y = {0, 1}. Given an sample x and cluster center set, fuzzy clustering granule G a (x) can be constructed and its corresponding to label y satisfies y ∈ {0, 1}. The binary classification problem of sample space can be equivalent to binary classification problem of fuzzy clustering granular space. If fuzzy clustering granules can be accurately distinguished, then the binary classification problem of the original sample space will be also solved synchronously. Below we give the definition of fuzzy clustering granular hyperplane.
Definition 1: a 2 , . . . , a M } is an attribute set, and Y = {0, 1} is an label set. Then in fuzzy clustering granular space, we can construct a series of fuzzy clustering granular hyperplanes as: whereW ,Q are fuzzy clustering granular vector parameters andW ,Q ∈ R M . For ∀x ∈ X , y ∈ Y , a rule r A (x) = [Ḡ A (x), y] composed of the fuzzy clustering granular vector induced by x on the attribute set A can be generated and its label is y. Thus a rule set R A = {r A (x)|∀x ∈ X } can be formed.
In the fuzzy clustering granular space, there is an optimal fuzzy clustering granular hyperplane which can minimize the classification error. Our goal is to find the one. That is, we need to solve the parameters W and Q. Below we introduce the conditional probability distribution.
To find the optimal solution, likelihood function is constructed as follows: where By maximizing likelihood function L(W ,Q), the optimal value ofW andQ can be obtained.

D. MULTI-CLASSIFICATION PROBLEM
The model mentioned above is a binary-classification model, we think of the multi-classification model below. Let discrete random variable y satisfy y ∈ {1, 2, . . . , D}. For calculation convenience, an indicator function can be introduced as follows: If a classification belongs to d, it can be denoted bȳ where T is transposition. IfḠ A (x) belongs to d, its probability can be defined: where W d , Q d are the weights corresponding to classification d and W d , Q d ∈ R M . Similar to the binary classification problem, the likelihood function can be defined: where θ = {(W i ,Q i )|i = 1, 2, . . . , D}.

1) SOLVING PARAMETERS
Genetic algorithm is used to solve the likelihood function. As well known, the algorithm can get global optimal solution. That is, finding the optimal solutionW * andQ * to maximize L(W ,Q) is our goal. We will concatenateW andQ into one vector and use equation (24) as fitness function to select evolution solution in each iteration. When iterative time reaches maximum value or the standard deviation of a set of fitness function values does not change obviously, the algorithm will end. The optimal parameters are selected from a series of parameters which can maximize the fitness function value.
where Z is the number of solution.
The stage is as follows: Step 1. Delete the samples missing attribute values.
Step 2. Calculate the cluster center set (see Table 1).
Step 3. Parallel distributed fuzzy granulation of samples.
Step 4. Set the number of solution Z , the current iteration I ← 0, precision , maximum iteration Max, and consecutive iteration t (after several consecutive iterations, the maximum fitness function value has no longer obvious change).
Step 5. Randomly generate Z pairs of solutions, namely Step 6. Calculate the fitness of each pair of solutions in NP and they can be written as Step 7. Transform the solutions into binary code.
Step 8. Given a series of probabilities p i , i = 1, 2, . . . , Z , if p i < F(W j ,Q j ), then the solution (W j ,Q j ) will be selected and duplicated. Then, the process is ended until Z pairs of solutions are selected and duplicated from NP, and these solutions can form a new group NP 1 .
Step 9. According to the number c of solutions participating in the crossover determined by the crossover probability P c , c pairs of solutions are randomly selected from NP 1 and are paired for crossover operation. The original solutions are replaced with the new solutions generated by crossover operation to obtain population NP 2 .
Step 10. According to the number m of mutations determined by the mutation probability P m , m pairs of solutions are selected randomly from NP 2 and performed mutation operation respectively. The original solutions are replaced with the new solutions generated by mutation operation to obtain population NP 3 .
Step 11. Update the solution set NP with NP 3 , namely NP ← NP 3 .
Step 12. Select the maximum solution of F(W ,Q) from NP Step 13. Calculate the standard deviation of sequence F I , F I −1 , . . . , F I −t by σ ← Var(F I , F I −1 , . . . , F I −t ) Step 14. Update the current iteration I ← I + 1.
Step 16. Return (W * ,Q * ) ← arg max After performing parameter learning, the random fuzzy clustering granular vector classification model is designed. When a test sample is given, it is first fuzzy granulated, and then the trained classification model is used to predict. The specific algorithm is shown in Table 2.

V. EXPERIMENTAL ANALYSIS
In the experiments, we employ six data sets for the experimental test, where three of them are from UCI and others are constructed with 1% noise based on the three data set, as demonstrated in Table 3. 10-fold cross-validation is adopted to experiment. That is, we randomly sample data of 80% as training set, and data of 20% as the test set, and then perform one verification. The operation is repeated 10 times. VOLUME 8, 2020   The average accuracy, and the average execution time are calculated as performance evaluation parameters. In terms of efficiency, we compared the parallel distributed clustering fuzzy granulation method proposed in this paper, serial clustering fuzzy granulation method, and traditional serial granulation method, as shown in Figure 2. In terms of average accuracy, we compared Long Short-Term Memory (LSTM), Support Vector Machines (SVMs), Back Propagation Neural Network (BPNN), and RFCGHC on these data sets.
In granular computing, granulation is an important and time-consuming task. At present, most fuzzy granulation methods are serial calculations, and the efficiency is so low that it can not meet the demand. If the tasks can be executed parallelly, the efficiency will be improved to a certain extent. For a computing task, to convert from serial computing to parallel computing, the prerequisite that needs to be met is that the internal logic of the computing task is parallelizable. We adopt the method of clustering first and then granulation, which can make fuzzy granulation parallelizable.
The single-node multi-process concurrent mode can enhance the operating efficiency of computing tasks to some extent, but due to the limitation of single-node computing resources, the improvement of computing efficiency is limited. The traditional multi-node data parallel computing model can solve the problem of limited computing resources, but due to the lack of a distributed file system, large IO problems will occur when sharing data sources, which affects read and write efficiency. MapReduce is usually used for large-scale data parallel processing tasks. By abstracting a hierarchical computing model, large data sets are fragmented into sub-computing tasks and assigned to computing nodes. The MapReduce model draws on the built-in functions Map and Reduce in the functional programming language. Conceptually, Map is called mapping, and Reduce is called reduction. The main idea is: split the sequence file job into multiple independently runnable Map tasks, distribute them to multiple processors to run, generate intermediate calculation results, and then combine the reduce task operations to produce the final output results. Both Map tasks and Reduce tasks can be highly parallelized. MapReduce is a programming model suitable for distributed parallel computing, and the Hadoop Distributed File System (HDFS) can solve IO problems well.
The Mapreduce calculation process is mainly summarized into two parts: Map and Reduce. Map receives input in the form of < key, value > (see Table 4), and then outputs intermediate results in the form of < key, value >. Here, key = x i , value =< f (a 1 , x i ), . . . , f (a M , x i ), C > (as shown in the Table 5). The output of Reduce is detailed in Table 6    the data set, and each subset corresponds to a map. As shown in Figure 2, the abscissa denotes the amount of cluster centers, and the ordinate is the average accuracy. When the number of samples is N = 10000 and the number of clustering centers is K = 6074, the average time consumed by the serial granulation is 100 minutes, the average execution time of the serial clustering granulation is 54 minutes, and the average execution time of parallel distributed granulation is 14 minutes. Parallel granulation with clustering reduced about 46% and 86%, respectively. When N = 50, 000 and K = 31966, the serial granulation consumes 2500 minutes, and the serial clustering granulation execution time is 1598 minutes, which is a reduction of 36%. The parallel cluster granulation execution time is 400 minutes, which is a reduction of 84% compared with the serial granulation. When N = 60, 000 and K = 32, 683, The parallel cluster granulation execution time is reduced by 46% and 86% compared with the serial granulation and the serial clustering granulation. It can be concluded that as the amount of samples increases, the execution efficiency of serial clustering granulation and parallel clustering granulation methods has improved greatly compared with serial granulation.
In the RFCGHC model, the amount of cluster centers is an important parameter that affects the accuracy of classification. We analyzed the relationship between the amount of cluster centers and the accuracy, as illustrated in Figure 3-5.
As demonstrated in Figure 3, in the noise-free data set Mushroom, the accuracy of SVMs, BP, and LSTM reached 0.952, 0.947, and 0.931, respectively, while RFCGHC reached the highest value of 0.989 when the amount of cluster centers reached 5600. The average accuracy increased by 3.89%, 4.44%, and 6.23%. After adding noise to the data set, when the amount of cluster centers is 5600, RFCGHC increases 7.97%, 8.21%, and 10.17% respectively compared with SVMs, BP and LSTM. In the two data sets, SVMs, BP, LSTM, and RFCGHC decreased by 5.15%, 4.86%, 4.94%, and 1.02%, respectively. It can be seen that the RFCGHC has the smallest drop and is less affected by noise than other three approaches.
In the noise-free data set EEG, as shown in Figure 4-a, when K = 10500, the average accuracy of RFCGHC reaches the highest value of 0.965, while SVMs, BP and LSTM are 0.934, 0.928, and 0.933. RFCGHC improved by 3.32%, 3.99% and 3.43% respectively. In the noise-containing EEG data set, as shown in Figure 4-b, RFCGHC reached a peak accuracy of 0.955 at K = 10500. Compared with SVMs, BP and LSTM, it increased by 7.42%, 8.03% and 7.67%,  respectively. It can be seen from this that RFCGHC is not sensitive to noise, and the decrease is only 1.04%, while the decrease of SVMs, BP and LSTM reaches 4.82%, 4.74%, and 4.93%.
The number of data set activity recognition samples has increased significantly compared with the previous two data sets, reaching 75128, and the number of categories has also increased to 4. As shown in Figure 5-a, without noise, when K = 60, 000, the average accuracy of RFCGHC reaches the highest value of 0.958, SVMs is 0.921, BP is 0.905, and LSTM is 0.951 (improvement by 4.02%, 5.86%, and 0.74%, respectively). Compared with the first two data sets, LSTM performs better than SVMs and BP, and RFCGHC is slightly better than LSTM. As shown in Figure 5-b, in the activity recognition data set with noise, RFCGHC increased by 7.62%, 8.36%, and 0.96% compared with SVMs, BP and LSTM, respectively. In this data set, RFCGHC and LSTM are less sensitive to SVMs and BP.
From the above analysis, we can see that with appropriate parameters, RFCGHC performs better than LSTM, BP, and SVMs in multiple data sets. We also draw conclusion that the accuracy fluctuates with the K value, namely not monotonously increasing or decreasing. The main reason lies in 10-fold cross-validation. That is, the algorithm is evaluated on a different training set and test set each time, which has a certain degree of randomness, but it also measures the performance of the algorithm more objectively. In addition, global optimal solution can be found by genetic algorithm in solving parameters. The accuracy of algorithm is different for different data sets. In general, data sets with more samples, fewer attributes, and fewer categories have higher accuracy of the algorithm; On the contrary, a dataset with a small number of samples, many attributes, and many categories has lower accuracy of the algorithm. The imbalance of the data set may also decrease the performance. For noisy data sets, we found that RFCGHC are more robust than any other three algorithms. The reason is that in the process of fuzzy granulation, RFCGHC contains a global comparison idea, which overcomes the interference of noise to a certain extent. If K is too small, the classification accuracy may be reduced; If K is too large, noise may be included and it may also be on the decline. Thereby we can adjust K to enhance the performance.

VI. CONCLUSION
We present a random fuzzy clustering granular hyperplane classifier in the paper. This classifier can be used for binary classification or multi-classification problems. In this model, we introduce a parallel distributed fuzzy granulation to obtain fuzzy granules and fuzzy granular vectors, and convert the classification problem of samples into the classification problem of fuzzy clustering granular space to solve. In fuzzy clustering granular space, we gave definitions of fuzzy clustering granule, fuzzy clustering granular vector, and fuzzy clustering hyperplane. We also designed their operators and loss function in fuzzy clustering granular space, of which solution can be obtained through genetic algorithm. The binary classification or multi-classification can be solved by the classifier. In the future, we will combine this algorithm with 5G edge computing to apply in internet of things.
CHAO TANG was born in Hefei, Anhui, China. He received the M.S. degree from Shanxi University, Taiyuan, China, in 2009, and the Ph.D. degree in artificial intelligence from Xiamen University, Xiamen, China, in 2014. He is currently an Associate Professor with the School of Artificial Intelligence and Big Data, Hefei University, China. His research interests include machine learning, computer vision, and human action recognition.
XIAOYU MA is currently pursuing the degree with the School of Computer and Information Engineering, Xiamen University of Technology, China. His research interests include machine learning, natural language processing, and granular computing.
YOUMENG LUO is currently pursuing the degree with the School of Computer and Information Engineering, Xiamen University of Technology, China. His research interests include image processing, artificial intelligence, and granular computing.