CNN-Based Hybrid Optimization for Anomaly Detection of Rudder System

In this study, an automatic test platform suitable for steering gears was established, which can test four sets of rudder systems separately. In addition, we propose an anomaly detection method based on deep learning technology to complete the automated multi-fault classification of the steering gear test. This paper combines the particle swarm optimization algorithm and the grey wolf optimization algorithm to optimize the convolutional neural networks (HPSOGWO-CNN). The proposed HPSOGWO-CNN model is constructed in two stages to realize the efficient and high-accuracy anomaly detection of the rudder system. In the first stage, through 10-fold cross validation, the optimal number of search agents of the HPSOGWO algorithm is obtained, and the performance is compared with GWO and PSO algorithms respectively. The results demonstrate that HPSOGWO algorithm is an excellent technique for automatic selection of hyper-parameters. In the second stage, the designed HPSOGWO algorithm is used to fine-tune the hyper-parameters of CNN, and a highly matched model for anomaly detection of rudder system test parameters was finally obtained. The experimental results show that the accuracy of this method is 99.846%, the precision is 99.748%, the recall is 99.498%, the F-score is 99.618%, and Kappa reaches 0.99565. CNN-based hybrid optimization for anomaly detection of rudder system, is advanced in comparison to KNN, SVM, BP, CNN, PSO-CNN, GWO-CNN, MGWO-CNN, WdGWO-CNN, RW-GWO-CNN models, in terms of accuracy, precision, recall, F-score, and kappa, respectively. Moreover, it is not affected by the imbalance samples, and can achieve accurate classification for small training samples.


I. INTRODUCTION
The rudder system is a kind of servo mechanism, which is widely used in the control systems of airplanes, ships, missiles, etc. It receives the control signals from the flight control system and drives the deflection of the rudder surface, so as to control of flight attitude and trajectory. In order to ensure the precise control of the rudder system, it is necessary to analyze the state parameters, static parameters and dynamic parameters of the rudder system during the production process. Initial testing of the rudder system was done manually. With the rapid development of automated test systems, the parameter test process has become efficient and accurate [1]. However, a large number of rudder system parameter test results are still analyzed by manual processing method at present, which leads to the difficulty in ensuring the accuracy of anomaly The associate editor coordinating the review of this manuscript and approving it for publication was Nuno Garcia . detection and time-consuming [2]. In the analysis of the rudder system, we judge the performance based on the test results of a set of parameters. If the parameter value is abnormal, it is the work we need to do to locate the specific problem directly, quickly and automatically. In recent years, machine learning technology has also been applied in anomaly detection [3]- [7].
There are few applications of machine learning in the evaluation of steering systems, and only 4 documents have studied it. Table 1 summarizes it.
In Reference [8], the authors used Support Vector Machine (SVM) to diagnose the fault of steering gear and realized intelligent data analysis. And they focused on developing a new decision-points distribution and weightassignment-oversampling method. The classification accuracy is 91%, the recall is 96.67%, the F-score is 94.2%, and the TNR is 74%. In Reference [9], the intelligent algorithm was applied to the fault detection and location of rudder VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ system and the imbalance data were effectively processed. The highlight of this article is the use of adaptive sampling algorithm considering informative instances (ASCIN) to process the dataset. This technology prevents the loss of important information in the dataset. The classification accuracy is 97.3%, and the TNR is 88.3%. The authors applied Shuffled Frog Leaping Algorithm-based Random Forest (SFLA-RF) algorithm to rudder fault diagnosis in Reference [2]. The advantages of this method for classification are short time and high accuracy. The classification accuracy is 99.787%, and the kappa is 0.99486. Reference [10] broke through the technical bottleneck of low parameter testing efficiency, the classified kappa is 0.9954, and the RMSE is 0.0395. None of the four papers analyzed and studied the classification performance of the various states of steering gear. In actual engineering, we not only need to ensure that the overall classification accuracy is high, but also that the different states of the steering gear can be accurately distinguished. This article introduces deep learning technology for the first time, and makes a systematic analysis in this regard, breaking the gap of deep learning technology in the classification of test data of the rudder system. We also analyzed anomaly detection and fault diagnosis in other areas related to the steering gear experiment, as shown in Table 2. In Reference [11] different fault modes of rolling bearings can be reliably identified. In Reference [12], the author used Random Forest (RF) as a classifier to classify the complex fault of gearbox. Reference [13] showed that SVM has outstanding generalization performance and can obtain high classification accuracy when applied in machine condition monitoring and diagnosis. In Reference [14], CNN was used for feature extraction of time-frequency graph, so as to carry out sensor fault classification. In Reference [15], Deep Neural Networks (DNNs) was designed for fault classification, which can overcome the shortcomings of artificial neural network. Reference [16] can achieve highprecision fault diagnosis when the workload changes. In Reference [17], Independent Component Analysis (ICA) was used for feature extraction of induction motor data, and multi-classification fault identification based on SVM was completed. Reference [18] applied the learning ability of Deep Belief Network (DBN) to fault diagnosis of rolling bearing. Reference [19] and [20] combined the powerful capabilities of neural network in feature extraction and deep auto-encoders (DAE) in classification.
To sum up, it can be seen from the above literatures that machine learning has been widely used in various fields of fault diagnosis. Among them, deep learning has a significant effect in mechanical fault diagnosis.
In recent years, Convolutional Neural Network (CNN) and optimization algorithms have been widely used in various fields. The Particle Swarm Optimization (PSO) and the Grey Wolf Optimization (GWO) algorithm have significant effects in optimization. Reference [21] used only pixels and disease labels as the input of CNNs to automatically classify skin lesions, with a level of competence almost equal to dermatologists. In Reference [22], the designed neural network was used to complete binary stress detection and three types of emotion classification. In Reference [23], the contextual deep CNN predicted the corresponding label of each pixel vector to complete hyper-spectral image classification. Reference [24] showed that the classification performance of DNNs can be significantly improved by PSO algorithm, and compared the impact of the swarm size on PSO and neural network architecture on classification performance. In Reference [25], in order to obtain better classification performance, the author adopted PSO algorithm to optimize the parameters in SVM. In Reference [26], the hyper-parameters in the CNN model were optimized by PSO algorithm, which achieved the purpose of improving network performance and made the selection of hyper-parameters automatic. Reference [27] pointed out that some hyper-parameters in CNN had a significant impact, whereas other hyperparameters were not very important. In Reference [28], ImGWO was used for feature selection, and ImCNN was used for network anomaly classification. In Reference [29], an enhanced GWO algorithm was used to optimize the hyperparameters of Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) networks. In Reference [30], the optimal size of system components was obtained by Hybrid PSO and GWO (HPSOGWO) algorithm. Reference [31] showed the superiority of HPSOGWO algorithm in optimizing the path. In Reference [32], the performance of the hybrid optimization algorithm was verified in a practical problem. In Reference [33], through the comparative experiments of various optimization algorithms, the superior aspects of HPSOGWO algorithm were obtained. Table 3 summarizes the relevant algorithms.
From the survey in Table 3, CNN has a significant performance on classification. Refined from the cited literature, the PSO algorithm can effectively fine-tune the hyperparameters of the CNN network, and the rate of convergence is fast, but the accuracy needs to be improved. The GWO algorithm is more excellent in terms of optimization accuracy. In recent years, the traditional machine learning algorithm has been applied to rudder system fault diagnosis, which can achieve automatic classification of abnormal data, but the classification performance for small samples needs to be improved. Inspired by the above research, we try to use the CNN optimized by the HPSOGWO algorithm to extract and classify the test parameters of the rudder system in order to achieve outstanding accuracy of fault location. Thus, the rudder system parameter analysis can be automated.
The main contributions of this paper are summarized as follows: 1) An automatic test platform for steering gears has been built to realize parameter testing of up to four sets of steering gears at the same time.
2) A convolutional neural network model combining particle swarm optimization algorithm and grey wolf optimization algorithm (HPSOGWO-CNN) is proposed for anomaly detection of rudder system. This model can obtain superior classification performance in the rudder system.
3) In this paper, the HPSOGWO algorithm is used to solve the problem of difficult selection of CNN hyper-parameters. 4) This article solves the problem of inaccurate classification caused by imbalanced steering gear samples.
The rest of this article is structured as follows. The Section 2 introduces the system model. In the Section 3, the optimal CNN model for rudder system testing is constructed. In the Section 4, the performance of the model is verified by a series of comparative experiments, which is followed by a summary of the Section 5.

II. PROPOSED SYSTEM MODEL
In this study, we propose a new model named HPSOGWO-CNN, which uses the HPSOGWO algorithm to optimize the hyper-parameters in the 1D CNN to obtain the best 1D CNN structure. Among them, the optimization algorithm is a hybrid variant that combines PSO and GWO variants together.

A. GREY WOLF OPTIMIZER (GWO)
The Grey Wolf Optimizer (GWO) was originally proposed by Mirjalili et al. [34]. Grey wolves have a very strict social hierarchy. The leaders of the pack are called Alphas (α). To be specific, Alphas manage the team and are answer for making decisions. The second largest scale of the grey Wolf is Beta (β). Beta wolves obey the Alphas and help the Alphas make decisions. The lowest level of grey wolves is Omega (ω), which is a group of the pack who are completely obedient to other wolves. The other wolves are called Delta (δ). They obey Alpha and Beta, but dominate Omega.
The GWO optimization process mainly includes the social hierarchy, encircling prey, hunting, attacking prey and search for prey.

1) SOCIAL HIERARCHY
In order to design the GWO, we established a mathematical model of the social hierarchy of wolves. Among them, the optimal solution is Alpha. Similarly, the second and third optimal solutions are called Beta and Delta. The remaining candidate solutions are named Omega. The algorithm optimization process is dominated by Alpha, Beta and Delta.

2) ENCIRCLING PREY
The mathematical model of encircling behavior of each search agent in the wolf pack is as follows. The position of the grey wolf is updated by (1) and (2).
where t indicates the current iteration, − → A and − → C are coefficient vectors, − → X p is the position vector of the prey, and − → X indicates the position vector of a grey wolf. VOLUME 9, 2021 The vectors − → A and − → C are given by the following mathematical formula: where components of − → a are linearly decreased from 2 to 0 over the course of iterations and − → r 1 , − → r 2 are random vectors in [0,1].

3) HUNTING
We set Alpha, Beta and Delta as the three optimal solutions. Then update the position of Omega and other wolves according to the location information of Alpha, Beta and Delta, as shown in Fig. 1.
Steps to update the location of the grey wolves.
The mathematical model of this behavior can be expressed as follows:

4) ATTACKING PREY AND SEARCH FOR PREY
According to (3), the decrease of − → a value will cause the fluctuation of − → A value. When − → A is in the [-1,1] interval, the next position of the agent can be at any position between the wolf and its prey. On the one hand, |A| < 1 forced the wolf to attack its prey. On the other hand, |A| > 1 forces the wolf to separate from its prey, hoping to find more suitable prey.

B. PARTICLE SWARM OPTIMIZATION (PSO)
Particle swarm optimization was first proposed by Kennedy and Eberhart [35]. The algorithm is composed of particles, each of which has only two attributes: velocity and position. Update the velocity and position of each particle through (12) and (13), and finally obtain the optimal solution. v k+1 where i refers to the particle in the swarm. k is the number of iterations. r 1 and r 2 values represent random numbers in the range [0,1]. The coefficients c 1 and c 2 represent the optimization parameters. P best represents the best position of the individual, g best represents the best position of the population.

C. HYBRID PSO-GWO (HPSOGWO)
Although PSO technology is subject to some limitations, it has some advantages, such as simplicity, durability, and easy implementation. The disadvantage is that it is easy to fall into a local minimum [36]. And, GWO algorithm has the characteristics of strong convergence performance, few parameters, and easy implementation. It avoids local trapping and maintains the balance between exploration and exploitation. But there is a lack of communication between individual positions and group positions. In this way, both these extraordinary features of PSO and GWO are incorporated into the algorithm of HPSOGWO [37]. The modified governing equations of the hybrid algorithm are as follows: The velocity and position update equations obtained by combining PSO and GWO algorithm are as follows: In (17), ω represents the inertia weight parameter.

D. FRAMEWORK CONSTRUCTION OF 1D CNN MODEL
In this study, a new 1D CNN model is proposed for the anomaly detection of rudder systems. Specifically, the 1D CNN model designed in this paper includes three convolution layers, two pooling layers, one flatten layer, two dropout layers and two dense layers. The structure of 1D CNN is described as follows:

1) INPUT LAYER
The input layer is used to accept preprocessed rudder system test data.

2) CONVOLUTION LAYER
This layer has multiple filters and performs most of the computation. The convolution is passing through this layer. Generally speaking, convolution generates a new spectrum representation by sliding the kernel with a certain ''stride'' across the entire spectrum. Then the output feature map is passed to the activation function to achieve nonlinear changes in the network layer. The activation function used in the convolutional layer designed in this paper is ReLU. Obtain the number of filters n i and filter size S F in the convolutional layers through the HPSOGWO optimization algorithm.

3) POOLING LAYER
It is usually used for feature dimensionality reduction to achieve the purpose of compressing data and reducing the number of parameters in the training process, thereby reducing overfitting and improving the fault tolerance of the model. There are two main types of pooling layers: maximum pooling and average pooling. This article uses maximum pooling. Also, HPSOGWO algorithm is used to determine the pool size S P .

4) FLATTEN LAYER
The flatten layer is used to ''flatten'' the output of the pooling layer, that is, to convert it into a vector. It is usually used for the transition from the convolution layer or pooling layer to the fully connected layer.

5) DROPOUT LAYER
The dropout layer achieves the effect of using random deactivation of hidden units to prevent overfitting. The introduction of this randomness forces the network to become redundant, so that the network does not match the training samples well, thereby increasing the generalization ability of the network.
In the training process, we randomly sample according to a certain probability to change the network structure, which is equivalent to training different networks. For the two dropout layers in this experiment, select two dropout ratio values of 0.25 and 0.5, respectively.

6) DENSE LAYER
This layer contains a large number of neurons, which are used to connect neurons in this layer with those in other layers. When the activation function of the dense layer is set to Softmax, the layer can be regarded as a classification layer [38].
The model designed in this paper has two dense layers. The number of units C in the first dense layer is determined by the optimization algorithm. And the second dense layer is used for classification. This experiment needs to divide the test results of the rudder system into 11 categories, so the number of units in this layer is 11.

III. EXPERIMENT VALIDATION A. RUDDER SYSTEM TESTING EQUIPMENT
This paper establishes an automatic test platform suitable for rudder system testing, which can automatically complete rudder system data collection and processing, performance index testing and other functions. The test equipment is mainly composed of a main control industrial computer, power supply, capture card, signal conditioning circuit, driver, pneumatic steering gear and multi-function data acquisition card. It can test four sets of rudder systems separately, which greatly improves the test efficiency. The system automatically analyzes the rudder system parameters returned to the industrial control computer to meet the requirements of anomaly detection.
It is well known that the rudder must be evaluated before use to ensure that all parameters are in normal condition. If the parameter is abnormal, it needs to be analyzed and adjusted immediately, which is directly related to the performance of the steering gear. Therefore, how to improve the accuracy of the anomaly detection of rudder system is very necessary. Fig. 2 shows a brief flow chart of the rudder system test.

B. DATASET DESCRIPTION
In this study, 19490 historical test data are used. The test items of the pneumatic rudder system included in this dataset are as follows: transient time, overshoot, steady-state errors, hysteresis, band width, etc. These parameters reflect the dynamic and static characteristics of the pneumatic rudder system, and are a comprehensive index for evaluating the performance of the rudder system.
Here, the columns in the dataset are called ''features.'' There are 10 types of features in our work, corresponding to 10 types of errors. The ''labels'' indicate 11 test statuses, consisting of a qualified type and 10 types of faults. Furthermore, the test data of each rudder system consist of 10 characteristic values and a label. The overview of the dataset is shown in Table 4.

C. EXPERIMENTAL 1D CNN ARCHITECTURE 1) EXPERIMENTAL SETUP
In this experiment, we hope to obtain the best network architecture, so the HPSOGWO algorithm is used to fine-tune the hyper-parameters of 1D CNN.
The experimental setup is as follows: The number of samples selected by one training of the neural network is called batch size. The loss function will vibrate and not converge due to the unreasonable number of batch-sizes; increasing the number of batch-sizes within a certain range can not only shorten the training time of neural network, but also improve the accuracy of training. Therefore, we define the hyper-parameter as: batch − sizes = 512. The training of 1D CNN model is performed using the adaptive moment estimation (Adam) algorithm [39]. In the optimization, the fitness function selects the accuracy value. Each neural network is trained for 100 epochs. The parameters of HPSOGWO algorithm are set as follows: maximum number of iterations is 10, c 1 = c 2 = c 3 = 0.5, ω ∈ [0.5, 1). Moreover, the dataset are partitioned by 10-fold cross validation. The specific method is as follows: first, preprocess the data in the dataset D; second, randomly shuffle the order; then divide it into 10 mutually exclusive subsets of similar size. The training set is composed of the union of 9 subsets (T ), and the testing set is composed of the remaining subsets ( ). Therefore, 10 times of training and testing are carried out, and the results of 10 tests are averaged to get the final result. The above experimental settings are used to analyze the performance of the optimized algorithms of HPSOGWO, PSO and GWO.
The process of using HPSOGWO algorithm to optimize CNN can be summarized as follows.
1) Set the population of grey wolf Pop = 30, the maximum number of iterations Max_iterations = 10, the optimization dimension Dim = 6, and set the upper and lower boundaries of the hyper-parameters according to Table 5. 2) Initialize the population of grey wolf X i (i = 1, 2, . . . , 6), which is the value of six hyper-parameters.

3) Population boundary check. Prevent the initialized
hyper-parameters exceeding from the upper and lower boundaries. 4) Evaluate the fitness of the population. The specific process is to substituting the initialized hyper-parameters into CNN and training CNN to obtain the accuracy of multi-fault classification of the rudder system. The accuracy is the fitness value. 5) Sort the fitness. 6) Sort the population position according to the fitness.
Specifically, let the fitness correspond to the hyperparameters one by one. 7) Set the best three grey wolves X α , X β , X δ , that is, the three sets of hyper-parameters due to the three highest fitness. 8) Iterations < 10, if Pop < 30, update parameters a, A, c, ω, calculate the velocity and position of the search agents according to (17) and (18). 9) Increase the number of Pop by 1. 10) Go back to Step 8-9, until Pop ≥ 30. 11) Go back to Step 3-6. Update the fitness corresponding to X α to the global optimal solution, and X α is the global optimal position. 12) Increase the number of Iterations by 1. 13) Go back to Step 8-12, until Iterations ≥ 10. The final highest fitness is the global optimal solution, and the global optimal position is the optimal hyper-parameters for constructing the network. 14) In order to obtain a more reliable and stable CNN model, this paper uses 10-fold cross-validation training, replaces the training set 10 times, repeats the above steps, and finally obtains 10 sets of optimal hyper-parameters.

2) IMPACT OF THE SEARCH AGENT SIZES ON HPSOGWO
The influence of the number of search agents in the HPSOGWO algorithm on the experimental results is analyzed by the experiment. The following search agent sizes are investigated: SA = {10, 20, 30, 40}, and Table 5 summarizes all hyper-parameters that need to be fine-tuned to obtain the best classification results. Use the testing set to evaluate the classification performance of the 1D CNN optimized by the HPSOGWO algorithm. As mentioned earlier, we perform 10-fold cross validation. Specifically, 10 repeated independent experiments are carried out for different search agent sizes. In Fig. 3, we show the classification accuracy of the HPSOGWO algorithm optimized for all search agent sizes. It can be proved    from the analysis graph that when 30 search agents are used, the classification accuracy is the highest.
In order to verify the consistency of the hyper-parameter quality of HPSOGWO algorithm obtained in 10 independent experiments, the box plot shown in Fig. 4 is drawn. Table 6 lists the average values of various results. Table 7 lists the accuracy results obtained from 10 independent experiments conducted with different search agents. In Fig. 4 and Table 6, we show that the performance of the accuracy in testing set when using HPSOGWO algorithm to optimize 1D CNN network under four different search agent sizes. Interestingly, increasing the number of search agents may lead to the deterioration of the lowest accuracy in ten optimization experiments. In other words, increasing the size of search agents may not be able to effectively improve the initial low-quality positions. In addition, it can be seen from Table 7 that when the number of search agents is 30, the average accuracy of 10 experiments can reach 99.790%. Compared with the number of search agents of 10, 20 and 40, the accuracy is increased by 0.047%, 0.026% and 0.006%, respectively.

3) ANALYSIS OF THREE OPTIMIZATION ALGORITHMS
In order to verify the superiority of the HPSOGWO algorithm in the optimization of 1D CNN hyper-parameters over the PSO algorithm and the GWO algorithm, we conducted the following two experiments. The population size of the three different algorithms is 30. As shown in Fig. 5 (a), compared with the PSO-CNN algorithm, the HPSOGWO-CNN algorithm has higher classification accuracy in when optimizing 1D CNN in 10 independent repeated experiments.
Another excellent feature of our proposed HPSOGWO-CNN model is that it converges faster than the GWO-CNN model. This is because, compared with the GWO algorithm, the HPSOGWO algorithm adopts the velocity and position update formula of the PSO algorithm, which improves the exploration ability of the GWO algorithm. In Fig. 5 (b), we show the number of iterations required in each of the 10 independent repeated experiments when the fitness function value reaches the optimal value for the first time. For example: in the experiment with serial number 1, when the HPSOGWO algorithm is used to optimize 1D CNN, the fitness function value reaches the best value after the first iteration is completed; however, under the same conditions of T and , using the GWO algorithm, the fitness function value can reach the best value at the 9th iteration.
Overall, it can be seen from this that the HPSOGWO algorithm is an effective technique for automatic selection technology of hyper-parameters in neural networks, and its performance is superior to that of the other two algorithms alone.

4) PERFORMANCE EVALUATION
Through the above experiments using 10-fold cross validation, ten different optimal solutions were finally obtained in ten experiments. Therefore, the 10 best solutions of the HPSOGWO-CNN model are shown in Table 8. In this experiment, we divide the training set and the testing set at a ratio of 8:2. Furthermore, due to the differences in initialization weights and deviations, repeated training of the network with the same hyper-parameters will result in different training results each time, in other words, different classification performance obtained [27]. Therefore, in order to obtain a more stable network structure, we performed five repetitive training on these ten best combinations, as shown in Figs. 6. Through the analysis of accuracy and kappa coefficient, it is concluded that the average performance of model 2 is the best.
The hyper-parameters of model 2 are used to construct the 1D CNN designed in this experiment. As shown in Table 9, the 1D CNN consists of 3 1D convolution layers, 2 1D pooling layers and 1 flatten layer in the convolution part. The first layer, the second layer and the fourth layer of the 1D CNN structure are convolution layers, which contain 60, 32 and 64 1D filters respectively. The size of each convolution layer filter is 2, the stride size is 1, and the output uses the ReLU activation function. The third layer and the fifth layer are the max pooling layer, and the pool size is 2. The sixth layer is the flatten layer. In general, data are input in the form of 10 × 1 vectors and output in 192 × 1 vectors through these layers. Next, as shown in Fig. 7, all the output data are connected into one vector, which is fed into a dense layer with 351 units. And dropout ratio value is selected to be as 0.25. The ReLU activation function is used for each unit. The last dense layer is the output layer, and dropout ratio value is selected to be as 0.5. Using Softmax as the activation function, 11 classification results are obtained. These 11 types of results divide the data into qualified types and 10 types of faults, thus realizing the anomaly detection of the rudder system.

A. EVALUATION METRICS
The following parameters are used to evaluate the performance of the proposed model: accuracy, precision, recall, F-score and kappa.
Pr ecision = TP TP + FP (20) Re call = TP TP + FN (21) where TP, TN , FP and FN represent true positive, true negative, false positive and false negative, respectively. TP refers to the normal category that is classified as normal, and FN refers to the normal category that is classified as abnormal.
On the contrary, an abnormal category classified as a normal category is called FP, and an abnormal category classified as an abnormal category is called TN .

B. COMPARISON OF PERFORMANCE INDEXES OF DIFFERENT MODELS
In order to thoroughly evaluate the proposed model, we compared the proposed HPSOGWO-CNN model with traditional machine learning algorithms, other neural network algorithms, unoptimized CNN, and CNN models optimized by different optimization algorithms. The experimental results of classification performance of each model are the average of 5 repeated experiments. There are various ways to improve GWO algorithm [40], [41]. In order to analyze the advantages of HPSOGWO algorithm in optimizing CNN, this paper compares it with MGWO [42], RW-GWO [43] and WdGWO [44]. Among them, MGWO and RW-GWO both improved the parameter update mechanism, and WdGWO improved the grey wolf individual position update mechanism.
The  Table 10. In order to analyze the complexity of the model, the space complexity of the proposed model is analyzed. Space complexity is mainly affected by the number of parameters in the model. The larger the space complexity, the larger the amount of data needed for training the model.   are the highest. Among them, the total parameters of the proposed model are 79827, which is relatively low.
The classification performance index shown in Table 12 is the average value obtained from 5 repeated experiments. Compared with KNN and SVM, the accuracy of our proposed model is improved by 3.566% and 1.898%, respectively. Recall, F-score and Kappa all perform poorly in the two traditional machine learning algorithms. Compared with the BP model, the accuracy is increased by 0.58%, which indicates that the feature extraction of data through the convolution layer can effectively improve the classification accuracy. The accuracy of the CNN model without optimization is 0.282% lower than that of the CNN model optimized by the HPSOGWO algorithm. In other words, the optimization algorithm is indeed effective in improving the classification performance.
In the experiment, six optimization algorithms of PSO, GWO, MGWO, WdGWO, RW-GWO and HPSOGWO are compared to optimize the classification performance of CNN. The results show that the PSO-CNN model performs the worst among the 5 performance index evaluations of accuracy, precision, recall, F-score and kappa, which are all lower than the CNN model optimized by GWO and improved GWO. Among them, the accuracy of the HPSOGWO-CNN model is 99.846%, the precision is 99.748%, the recall is 99.498%, and the F-score is 99.618%. Kappa is 0.99565, which is the closest to 1 compared with the other 9 models. Among the 4 improved GWO algorithms, the evaluation indicators of HPSOGWO-CNN stand out.
In addition, we report the program execution time of different models. Specifically, the running time required by GWO-CNN is 78.074s, and the running time of the 4 improved GWO algorithms has been shortened. Among them, the running time of HPSOGWO-CNN is the shortest, reaching 56.288s. On the premise of ensuring high-quality, the CNN model structure optimized by the HPSOGWO algorithm can reduce its execution time.
The non-parametric statistical hypothesis tests of the proposed model and other models are shown in Table 13. The progressive significance p 1 is obtained using the Mann-Whitney test. It is an approximate normal calculation probability and is suitable for data with a large sample size. The precision significance p 2 uses Kruskal test, which is the probability obtained by the exact test, and is suitable for data with a small sample size. The p-value reflects whether the difference between the two models is statistically significant. p-value <0.05 indicates that there is a significant difference between the two models. Compared with the proposed model, the non-parametric test values of the two methods are both less than 0.05, indicating that the comparative experiment in this paper is meaningful [45].
In summary, the proposed HPSOGWO-CNN model has the best level and excellent performance in all aspects of these 10 models. It is an outstanding model for the anomaly detection of the rudder system.

C. COMPARISON OF PERFORMANCE INDEXES OF DIFFERENT CATEGORIES
In our work, the HPSOGWO-CNN model performed well in the overall evaluation performance indicators. Then, in order to further analyze the classification of each category, we plotted the confusion matrix of the HPSOGWO-CNN model in Table 14, as well as the accuracy, precision and F-score of each category of each model in Table 15. The confusion matrices of the above experiments all selected the best one from these five experiments. Class Q in Table 15 indicates that the rudder system is qualified, whereas class FA to FJ mean the single fault (fault A to fault J).
It is worth noting that the KNN and SVM machine learning algorithms are greatly affected by the imbalance of data, and they are less effective in the classification of FE, FF, FG and FJ. Especially the KNN model, in the FF classification,  the accuracy is less than 50%, cannot achieve the classification performance.
When the neural network model is used for classification, it is less affected by data imbalance. Basic neural network models, such as BP and CNN, improve the classification performance of each type of fault to more than 90%.
Next, compare the classification performance of the model obtained by using PSO and GWO to optimize the Among them, in the analysis of accuracy, precision and F-score of the HPSOGWO-CNN model, FA, FB, FD, FE, FF, FH, FI categories can all reach 100%. The sample size of the 7 types of test sets is less than 200, indicating that for small samples, the model can still achieve the correct classification.

V. CONCLUSION
Aiming at the anomaly detection of the rudder system, an automatic test platform suitable for the rudder system is established, and a new model named HPSOGWO-CNN is proposed for anomaly diagnosis. As we all know, the performance of neural networks directly depends on their hyper-parameters, and the artificially designed hyperparameters cannot achieve the best network structure, so this experiment uses the HPSOGWO algorithm to complete the construction of CNN. The designed CNN is used for feature extraction and classification of experimental data for rudder testing. The results of Section 3 show that in the experiment of optimizing the hyper-parameters of the CNN, compared with the PSO and GWO algorithms, the HPSOGWO algorithm has obvious advantages in accuracy and the rate of convergence. In other words, it is a fast and efficient algorithm for hyper-parameters automatic selection of neural network. The results in Section 4 show that our proposed model can achieve 99.846% accuracy, 99.748% precision, 99.498% recall, 99.618% F-score, 0.99565 Kappa in the multi-fault classification of the rudder system, and the classification performance is hardly affected by sample imbalance. He was awarded a second prize for scientific and technological progress in Shanxi province and also presides over more than ten scientific research projects (include government sponsored researches and projects of schoolenterprise cooperation), such as the National Fund and the Doctoral Foundation of the Ministry of Education and has authored or coauthored more than 20 academic papers. His research interests include automated testing and control technology, equipment test detection and system integration, and intelligent instrument.
CHENXIA GUO received the B.S., M.S., and Ph.D. degrees from the North University of China, Taiyuan, China. She is currently an Associate Professor with the School of Instrument and Electronics, North University of China. Her research interests include automated testing and control technology, design and integration of complex electromechanical systems, and vision measurement.
HAO QIN received the B.S. degree in measurement and control technology and instrumentation from the College of Instrumentation and Electronics, North University of China, Taiyuan, China, in 2019, where he is currently pursuing the master's degree in instrumentation science and technology. His research interests include rudder systems, deep learning, and health management.