Features Fusion Exaction and KELM With Modified Grey Wolf Optimizer for Mixture Control Chart Patterns Recognition

Control charts are significant diagnostic tools to detect and identify the quality fluctuation of the complex industrial process. In the practical production process, attention is being paid to the monitoring of mixture control charts, which usually coupled by two or more basic control charts modes. This research is to present a hybrid pattern recognition method for mixture control charts. The proposed method mainly covers the feature fusion extraction (FFE) and kernel extreme learning machine (KELM) with modified grey wolf optimizer (MGWO). The FFE module applies the original data and their shape and statistical features as the features, then uses kernel entropy component analysis to reduce the feature dimension and extract valid features. One significant difficulty of KELM is to get suitable parameters like the penalty parameter and the kernel function parameter value. MGWO is established to the optimal tuning of KELM parameters, which improves the population initialization and nonlinear convergence factor of traditional grey wolf optimizer. The proposed methodology is promising to obtain a better classification recognition rate, less computational time and achieves more stable results in the pattern recognition problem of mixture control charts.


I. INTRODUCTION
With the development of data collection technology, control charts are the most popular tools in statistical process control and mainly apply to record or monitor the fluctuations in quality problems of complex industry. However, it is difficult to judge whether the control chart will lose control or not. Normal control chart pattern indicates that the process is under control. The abnormal patterns show that the process is out of control and there are some faults or variations needed to take the improvement actions [1]. As the production process becomes more complex and automated, the actual data present the multiple patterns coupling, which may be coupled by two or more basic modes. More details regarding the control chart patterns (CCPs) can be found, Figure 1 and Figure 2 show the six basic CCPs and four mixture CCPs, respectively [2]- [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Chao Tong. Adequate recognition of mixture CCPs is regarded as a crucial control risk to detect the fluctuations and explore the certain causes of abnormal patterns. Researchers have been developed various rules, such as elucidate manually, zone tests or expert systems, to help the engineers of quality control to recognize abnormal mixture CCPs [5]- [8].
Pelegrina et al. [5] proposed a blind source separation to recognize the concurrent CCPs. Lu et al. [6] utilized VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ independent component analysis to get efficient features, then recognized the mixture CCPs using support vector machine.
Addeh et al. [7] described an optimized radial basis function neural network for CCPs. Yang et al. [8] identified mixture CCPs by extreme-point symmetric mode decomposition and extreme learning machine. These researchers offer some ways to realize mixture control charts pattern recognition more effective, however, it still has not addressed the problem adequately.
For the mixture CCPs problem, data processing plays an important role, in which the core principle is to transform the original data into feature data by dimensionality reduction [9], [10]. Multivariate statistical process monitoring methods, like partial least squares, principal component analysis (PCA), and some related improved methods like kernel PCA (KPCA), are the most classical dimensionality reduction methods [11]. However, these methods realize dimensionality reduction to describe the major trends by selecting the top eigenvalues of the process data set. Kernel entropy component analysis (KECA) is a novel spectral data processing approach, which reveals the intrinsic characteristics by the information entropy and not needs to select the eigenvectors of the kernel matrix. Moreover, transformed data of KECA have a distinct angular structure and each carries the cluster structure information, and it has been widely used to extract features [12]- [15]. However, data of most practical complex processes have the characteristics of timevarying, large amount and nonlinear. To solve these problems, a feature fusion extraction (FFE) method is proposed. The principle idea of this approach is to use KECA to obtain effective features from the original data and the shape and statistical features. This can decrease the features dimension and computational complexity.
Machine learning algorithms have been increasingly utilized in the classification of CCPs in recent years, like artificial neural networks (ANNs) and support vector machine (SVM). ANNs have been the most frequently used method, which aoolies a multi-layer perception with backpropagation to recognize CCPs [16]. However, ANNs still have many disadvantages, like the need for large amounts of data to train, poor generalization ability and over-fitting problem [9]. Recently, SVM has increased attention because it can obtain remarkable results. But its accuracy largely depends on the choice of the kernel function parameters [17], [18].
Extreme learning machine is an extremely efficient approach and has been widely applied in the financial, manufacturing and service industry [19], [20]. Kernel extreme learning machine (KELM) is proposed for the kernel matrix replaces the randomness matrix of ELM. The input data can be mapped to a high dimensional space by kernel functions, which can avoid the random fluctuations and find the hidden features more quickly and efficiently [21], [22]. The penalty factor and the kernel parameter play an important role in the output of KELM and should be optimized in this paper.
Presently, numerous studies have been shown biologically-inspired methods (like genetic algorithm [23], [24] and particle swarm optimization [25]) for the KELM kernel function parameter. However, these methods have the weaknesses of premature convergence and a relatively high percentage of errors. The grey wolf optimizer (GWO), which imitates the hunting behavior of grey wolves, has widely used in numerous studies on solving practical optimization problems [26]- [29]. Besides, many improved GWO methods have been proposed to enhance the searchability. Cai et al. [30] proposed a new parameter learning strategy based on an improved grey wolf optimization strategy. Zhao et al. [31] proposed chaos enhanced grey wolf optimization wrapped ELM for the diagnosis of paraquat-poisoned patients. Emary et al. [32] proposed a novel binary version of the grey wolf optimization to select the optimal feature subset for classification purposes. A modified grey wolf optimizer algorithm (MGWO), which improves the population initialization and nonlinear convergence factor, is applied to optimize the kernel function parameters of KELM.
In this paper, we propose many traditional approaches to verify the effectiveness of the proposed FFE_KELM_ MGWO approach, like features, feature extraction methods, parameters optimization methods. Experimental results indicate that the proposed approach can effectively recognize mixture control charts patterns and perform better. The main specific findings of this paper can be summarized as follows: 1. A potential feature extraction approach FFE applies the original data and their shape and statistical features as the features, then uses KECA to reduce the feature dimension and extract valid features.
2. A new nature-inspired method MGWO, which improves the population initialization and nonlinear convergence factor, is applied to optimize the KELM kernel function parameters.
3. The proposed FFE_KELM_MGWO methodology tends to obtain better patterns recognition rate, less computational time and achieves more stable results in the mixture CCPs problem when compared to several other methods.
The remainder of this paper is organized as follows. Section 2 describes the needed concepts of the FFE method. In Section 3, the details of the KELM_MGWO classifier are elaborated. Section 4 introduces the detailed procedures of the FFE_KELM_MGWO methodology. In Section 5, the experimental results are given to verify the performance of the proposed methods. Conclusions and the research findings are presented in Section 6.

II. THE PROPOSED FFE FEATURE EXTRACTION SCHEME
Complex production original data mostly express mixture control chart patterns. The original data usually present large amounts and coupling, extracting the suitable features is a challenging task. As we know, shape and statistical features can reflect the characteristics of the CCPs and have different properties of different types. To avoid the loss of important information, the original data, shape and statistical features are applied as the inputs, then use KECA to reduce the feature dimension and extract valid features.

A. SHAPE FEATURES
The shape features are efficient to simplify the data number and get useful information for mixture CCPs. Five sharp features are considered in this work including Slope, N 1 , N 2 , APML, APLS. The detail calculation method of these five shape features can be found in Table 1. Box plots of shape features in the different CCPs signals are shown in Figure 3.

B. STATISTICAL FEATURES
In Ref. [33], statistical features can obtain information for CCPs efficiently. We apply this method in the proposed method, which selects Mean, Standard deviation (SD), Mean-Square, Skewness, Kurtosis, Positive Cusum, Negative Cusum, and Average Autocorrelation as the parameters. Each statistical feature can be used to differentiate the patterns, their definitions are respectively shown in Table 2 and the values of these features for different CCPs signals are shown in Figure 4.  The definition of Renyi quadratic entropy is as follows: where p(x) is a probability density function of the data set (X 1 , X 2 , . . . , X n ). For the characteristics of the monotonic function, the quantity can be got by A Parzen window density estimator is invoked to estimate V (p) byp where k σ (x, x t ) is a Mercer kernel function or Parzen window, σ is width parameter.  Using the sample mean approximation of the expectation operator, we can obtain the following equation where element (t, t ) of the (N × N ) kernel matrix K equals k σ (x, x t ) and I is an (N × 1) vector of ones. K is the kernel matrix: where D is a diagonal matrix of eigenvalues λ 1 , . . . , λ n ; E is a matrix of the corresponding eigenvectors e 1 , e 2 , . . . , e n . Eq.(5) can be rewritten aŝ The transformation may be expressed as Out-of-sample data points represented by are projected onto U k over the selected components yielding.
In the dimension reduction of the KECA algorithm, the core is to estimate the contribution value of each principal element directly to the Rayleigh entropy, then determine the degree of retention of the information in the direction of the principal element, and then the data is mapped to kernel principal directions that contribute greatly to the Rayleigh entropy [14].
For retaining more information of original data, the entropy contribution rate is adopted to obtain the selected principal elements in the data dimensionality reduction process.

III. KELM_MGWO CLASSIFIER METHOD A. KERNEL EXTREME LEARNING MACHINE (KELM)
Extreme learning machine (ELM) is proposed based on a single hidden layer feedforward neural network (SLFN), which has extremely generalization and fast learning ability compared to the traditional methods [19]. The advantage of ELM is that it can randomly generate the threshold of neurons in the hidden layer and the connection weight between the input and hidden layers. It does not need to adjust the weights of the SLFN, and hence involves less computational complexity and consumes less time.
The ELM structure is described below: where the output vector β = [β 1 , β 2 , · · · , β L ], the ELM fea- , N is the number of patterns. It does not need to tune the initial weights, which is the input layer with m inputs and hidden layer with L neurons. Thus using the tanh function as the activation function: The hidden layer randomized matrix can be expressed as 13) and the target vector is given by where t * means the expected output (sample label). Eq. (11) is expressed in a matrix form as To solve the overfitting problem and get better generalization ability, a constrained optimization method for β needs to establish in the original ELM. It can be expressed as where ξ is the error vector as ξ = [ξ 1 , ξ 2 , · · · , ξ N ], C is the regularization parameter. KKT theorem is used to transform the constrained problem into a dual problem as The output weight β is calculated as The output from the ELM is implemented as Compared with the traditional ANNs methods, the training process of ELM is simple. However, the learning performances, like stability and generalization, are determined by the selection of activation function and the neurons number in the hidden layer. To deal with this problem, KELM is designated using a kernel function to obtain better generalization and stability based on the Mercer theorem, the kernel matrix of the KELM is expressed as Hence, the output function is obtained as Eq. (24) can be written as follows: To overcome this problem, a modified grey wolf optimizer is applied to obtain the optimal parameter values.

B. MODIFIED GREY WOLF OPTIMIZER FOR TWO-DIMENSIONAL KELM PARAMETERS
The effectiveness of KELM is mainly determined by two crucial parameters of C and γ . In this paper, a modified grey wolf optimizer (MGWO) is proposed to optimize two-dimensional KELM parameters, where the good point set theory is used to generate the initialization of the gray wolf algorithm, and nonlinear hyperbolic tangent function as a new convergence is applied to balance the global searchability and the local development ability.

1) GREY WOLF OPTIMIZER
Grey wolf optimizer (GWO) is a newly introduced global optimization algorithm by Mirjalilli in 2014 [26], which imitates the hunting and searching mechanism of grey wolves. The four-level social hierarchy of grey wolves in GWO has assumed α, β, δ and ω. α wolves are considered as the leaders and all other grey wolves follow their instructions. β wolves are responsible for helping α in their decision making, which is the best candidate to be the α. δ wolves dominate the wolves of the last level ω wolves. The detail social hierarchy is shown in Figure 6.
The distances from α, β and δ wolves i.e. D α , D β and D δ to each of the remaining wolf ( X ) are obtained using Eq. (26). X 1 , X 2 and X 3 can be obtained as expressed in Eq. (27).
VOLUME 8, 2020 FIGURE 6. The social hierarchy of grey wolves.
where α, A and C are calculated using Eq. (28), r 1 and r 2 are the random numbers between [0, 1]. Vector α is used in calculating A, which involves the controlling activity in GWO. The element of α linearly decreases from 2 to 0 at each process of iteration. C can set some extra weight on the prey, it makes the wolves difficult to find it. Finally, their positions X (t + 1) of other wolves update using Eq. (29).

2) MODIFIED GREY WOLF OPTIMIZER
To get the better solution of two-dimensional KELM parameters, a modified grey wolf optimizer is proposed, where the good point set theory is used to generate the initialization of the gray wolf algorithm and nonlinear hyperbolic tangent function as a new convergence is applied. A detailed description of the modified grey wolf optimizer is introduced in this section.

a: IMPROVEMENT OF POPULATION INITIALIZATION
The diversity of the initial population affects the global convergence speed and the quality of the solution of the swarm intelligence algorithm, better diversity can improve the optimization performance. In the case of the global optimal solution unknown, the initial population is spread within the search range as widely as possible, which can provide superior global search performance in the early stage of the algorithm [23], [35]. The standard gray wolf optimization algorithm uses a random method to generate the initial population, which is difficult to guarantee the diversity of the initial population. Besides, it also makes the algorithm poorly diverse and affects the global search ability [24]. To improve search efficiency and ensure the diversity of the initial population, this paper uses a great point set theory to generate the initial population. This method makes the initial population more evenly distributed in the solution space.
According to the theory of good point set, the method of making good points in t-dimensional space is as follows: where r i = {2 cos(2πi/p)}, 1 ≤ i ≤ t.p is the smallest prime number, which is satisfied p ≥ 2t + 3.In this paper, the solution space dimension t = 2, that is p ≥ 7, that is, p is the smallest prime number greater than or equal to 7, and the specific value of p is 7. r i = e t , 1 ≤ i ≤ t, {r i * k} is the fractional part of r i * k.

b: IMPROVEMENT OF NONLINEAR CONVERGENCE FACTOR
The convergence factor a in the standard gray wolf algorithm is a very important parameter, which decreases linearly from 2 to 0. It can adjust the balance of global and local search, where the large initial value a of the algorithm guarantees the global searchability, the later smaller value a enhance the searchability near the current optimal solution and ensure the local development capability. In the complex algorithm search process, the linear convergence factor can not fully balance the global search ability and local development ability [25]. For improving the recognition accuracy and search speed, this paper proposes a new convergence factor update formula based on hyperbolic tangent function. The update convergence factor a equation is: where t max is the maximum number of iterations, this paper is set as 100; a initial and a final are the initial and final values of the convergence factor a, are 2 and 0, respectively; λ and k are the adjustment parameters, λ is the deceleration rate of the adjustment factor, k is the adjustment factor, which can further adjust the global search and local development capabilities, λ = −2π, k = π. According to the parameter settings in this paper, the decreasing curve of the convergence factor a is shown in Figure 7. The improved nonlinear decrement factor can balance the global search ability and local development ability. A large value is maintained for a long time in the early stage to improve global search efficiency. In the latter part of the iteration, it keeps a small value for a long time to improve the local development and the local search accuracy.

IV. THE PROPOSED FFE_KELM_MGWO SCHEME
This study integrates FFE, KELM, and MGWO to effectively and automatically recognize mixture CCPs. The proposed scheme is divided into three parts: feature extraction, parameter optimization, and pattern recognition. The general  structure and the detailed schematic diagram can be described in Figure 8 and Figure 9.
In the feature extraction part, FFE is utilized to obtain effective features. The original data of the process usually present large amounts and coupling, extracting the suitable features is a challenging task. As we know, the shape and statistical features of CCPs are different and can be applied to distinguish the mixture CCPs. However, it has the problem of difficulty to represent the complete information. To overcome this matter, the original data and the shape and statistical features are mixed as the features input. However, it makes the feature dimension increases sharply, which increases the computational difficulty. The KECA method, which determines the selected principal elements by the entropy contribution rate, is utilized to decrease the feature dimension.
In the parameters optimization part, MGWO is employed to optimize two key parameters of KELM classifier: the penalty parameter (C) and the kernel function parameter (γ ). The classification accuracy of training samples is applied as the fitness function: (32) where let N be the total number of training samples, k is the number of correctly identified labels, and ACC is the accuracy of the trained model by KEML classifier. The average accuracy is obtained as the ratio of correctly identified labels to the trained labels. Optimal parameters pair (C, γ ) of KELM get by modified grey wolf optimizer.

V. PERFORMANCE ANALYSES
The performances of the proposed approach are evaluated in this section. The mixture CCPs are obtained from an automatic pattern generator, the formulas presented in Table 3 [30]. Six basic, normal (NOR), cyclic (CYC), increasing trend (IT), decreasing trend (DT), upward shift (US) and downward shift (DS), are considered. Four mixture CCPs are coupled by two or more basic modes. The principle of upward/downward shift or increasing/decreasing trend is similar, one of them is separately selected in this study. The increasing trend, downward shift and cyclic are applied to simulate the mixture CCPs. For this study, each pattern contains 200 samples, which the observation window used is 40 data points. We have used 70% of the data for training and the rest for testing. The performance is evaluated by the average recognition accuracy of the mixtures CCPs on the testing samples. The average accuracy is obtained as the ratio of correctly recognized labels to the tested labels. Computer programs of experiments are performed on MATLAB [R2016b] environment in Intel core i3-4150 with 4 GB RAM using Dell computer.

A. PERFORMANCE OF FFE METHOD EVALUATION
Feature extraction plays an important part in the effectiveness and speed of mixture CCPs. To explore the superior performance of the proposed FFE approach, two other methods are used to compare: original data (OD) and shape and statistical features (SSF). OD method chooses the 40-dimensional timedomain sample data as features. SSF uses 8 statistical features and 5 shape features as the features. KELM with MGWO optimization algorithm is utilized to identify the mixture CCPs.
In this paper, the performances of the three feature extraction methods are evaluated using the recognition accuracy rate (RAR) and run time. The minimum (Min), mean, and the maximum (Max) of RAR, standard deviation, optimal parameters best (C, γ ) and run time are calculated from the 10 independent runs, the RAR results are shown in Table 4.
From Table 4 and Figure 10, it can be seen that the average recognition accuracy of the proposed FFE_KELM model has arrived at 98.66%, which is better recognition performance. Comparing the standard deviation of three methods, FFE feature extraction method is significantly reduced. It indicates that the proposed feature extraction approach can achieve global optimal solutions more stable. Besides, it shows that the three methods can finish within a suitable run time, FFE has not increased the computational burden.

B. PERFORMANCE OF RECOGNIZER IN THE STATISTICAL PROCESS CONTROL METHOD
The importance of the optimal selection of feature extraction is proved. The KECA method is evaluated in this experiment, which is used to get the feature dimensionality reduction for mixture CCPs classifier. We apply the traditional statistical process control methods like PCA and KPCA. KELM with MGWO optimization approach is used to recognize the mixture CCPs. The average recognition accuracies and run time of the three statistical process control models are presented in Table 5. Table 5 illustrates that 99.50% recognition accuracy rate is achieved by FFE in the KECA method, 89.62% and 98.67% are got under the FFE in PCA, KPCA method. The proposed KECA statistical process control model has the best recognition accuracy rate. Besides, by comparing with OD and SSF, the FFE method can obtain better RAR in different statistical process control methods, it indicates that FFE is valid once again. Table 5 lists the run time of the proposed KECA statistical process control method mostly less than the other two methods. FFE_KELM_MGWO using the KECA method can obtain the best recognition accuracy rate and least run time.

C. PERFORMANCE OF MGWO OPTIMIZATION METHOD
To illustrate the effectiveness of the proposed MGWO method, the following four benchmark functions are used for verification. They are Sphere, Schwefel2.26, Ackley, and Rastrigin functions, respectively. The figure 11 shows the functions images.
To investigate the effectiveness of KELM_MGWO, three other methods GA, PSO, and GWO are utilized. The convergence results of the above four benchmark functions are shown as figure 12, it can be seen that the proposed MGWO method in this paper has great performance, fast convergence speed, and superior global search capability.
The performance of MGWO parameter optimization algorithms of KELM is evaluated for mixture CCPs. Two key parameters of the best penalty parameter (C) and the kernel function parameter (γ ) of KELM should be chosen in the experiment. To investigate the effectiveness of KELM_MGWO, three other methods GA, PSO, and GWO are utilized. The results of the average recognition accuracies and run time of the four optimization models are described in Table 6.    optimization methods of KELM can get a relatively good solution in recognition accuracy under FFE. The proposed FFE_KELM_MGWO model can obtain at 99.50%, which has a better recognition accuracy rate. At the same time, figure 13 shows the convergence curves of four different optimization methods. It indicates that the proposed method can obtain the optimal parameters.
In different parameter optimization methods, the FFE method always obtains the better RAR. However, the run time of the proposed MGWO and GWO parameters optimization methods are significantly reduced compared with GA and PSO. It indicates that the GWO method can improve computational efficiency.

VI. CONCLUSION
Abnormal control chart patterns can reflect the quality fluctuation of the complex industrial process. This work identifies problems by mixture CCPs, in which the characteristics are coupled by two or more basic modes. A fast intelligent and accurate approach for mixture CCPs is proposed in the study, which consists of two main aspects: feature fusion exaction and parameter optimization in the KELM classifier. Original data, shape and statistical features, features fusion and KECA are utilized to make the features more effective. Modified grey wolf optimizer is next utilized to obtain the best two-dimensional KELM parameters.
In the experiments, the basic and mixtures patterns are simulated by the works of literature. From the computational results, the proposed FFE_KELM_MGWO is quite effective in identifying the mixture CCPs. Our main contributions can be summarized in three aspects. Firstly, FFE is successfully applied to combine the original data and the shape and statistical features, then uses KECA to reduce the feature dimension and extract valid features. Secondly, the experiment also shows that the proposed KECA statistical process control method can deliver better recognition results compared with PCA and KPCA. Third, the proposed MGWO algorithm in training the KELM classifier can get a better combination of parameters, both the recognition accuracy rate and run time are greatly improved.
Currently, the proposed scheme is quite effective in recognizing mixture control charts. An enhancement would be to analyze the specific failure causes and predict fault in a few time steps in advance. More works will be contributed to promoting the widespread use of the proposed algorithm in the actual complex production process. He is currently a Professor with the School of Mechanical Engineering and the Technology and Equipment of Rail Transit Operation and Maintenance Key Laboratory of Sichuan Province, Southwest Jiaotong University. His research interests include mechanical intelligent optimization and dynamic simulation, logistics technology and equipment, and intelligent fault diagnosis.