An Enhanced Unsupervised Extreme Learning Machine Based Method for the Nonlinear Fault Detection

Although the unsupervised extreme learning machine (UELM) based methods have been widely used to diagnosis the nonlinear process faults recently, the UELM algorithm is only designed to preserve the local adjacency similarity of the input dataset instead of mining the intra-class variations. Besides, the determination of the optimal UELM hidden nodes number is a tough issue. In order to deal with these two problems, a novel enhanced UELM (EUELM) based scheme is developed to effectively detect the nonlinear process faults in our work. In the proposed EUELM approach, the UELM algorithm is first improved by naturally incorporating the diversity analysis technique into the original UELM objective function to preserve both the intra-class variation and the local adjacency similarity of the input dataset. Then, to settle the difficult issue of selecting the optimal number of hidden nodes, kernel trick is further employed in the EUELM approach to mine the data strong nonlinearity. Based on the extracted diversity and local similarity low dimensional feature information, the k-nearest neighbor (KNN) principle is applied to derive a monitoring statistic for fault detection. At last, the experiments and comparisons on the monitoring effectiveness of the suggested EUELM based approach are made on a numerical nonlinear system and the benchmark Tennessee Eastman (TE) process. The obtained monitoring results illustrate that the significant improvements can be achieved by the proposed EUELM based fault detection approach compared with other popular and related approaches.


I. INTRODUCTION
With the increasing demand of industrial process security and reliability, fault detection technology has been paid more and more attention. Recently, as massive measurements are stored in industrial production processes by using advanced computer control systems, the data-driven based monitoring approaches are becoming a fascinating topic and gains increasing interests. Some classical data-driven multivariate statistical approaches, such as partial least squares (PLS) and principal component analysis (PCA) based methods are broadly utilized to discover the faults [1], [2]. To deal with the The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . nonlinearity of the in real process data, the improved versions of the traditional PCA and PLS are discussed for nonlinear process fault detection, for instance the neural network or kernel trick based PCA [3], [4] and the kernel trick based PLS [5]. To enhance the quality of fault detection by suppressing the effect of measurement errors, Sheriff et al. [6] suggested a novel monitoring scheme by combining multiscale representation based PCA with moving window generalized likelihood ratio test technology. Fezai et al. [7] discussed an online reduced kernel PCA based fault detection method to tackle the conventional kernel PCA's limitation of monitoring dynamic systems with large training dataset. In order to cope with the process parameters changes, measurements' errors and uncertainties over the long operation periods, two improved interval reduced kernel PLS models were proposed to monitor large scale nonlinear uncertain systems in the literature [8]. Nevertheless, during the dimensionality reduction procedure, these nonlinear extensions omit the detailed local adjacency similarity structure among neighboring samples, because they only focus on the diversity information (i.e., the intra-class variations) of the samples.
As an efficient learning technique, extreme learning machine (ELM) is receiving a lot of interests in data-driven based fault detection and diagnosis fields. The ELM is indeed a single layer feed forward neural networks, where the network hidden layer parameters are experientially determined [9]. By transforming the original input dataset into a high dimensional space via the nonlinear transformation function, the ELM is an effective approach to deal with the nonlinearity of process data [10]. In order to diagnose the nonlinear process faults effectively, Boldt et al. [11] integrated the concept of cascade feature selection into the ELM model to combine different feature selection methods. Luo et al. [12] employed the real-valued gravitational search algorithm to optimize the input parameters of the ELM model, for the purpose of identifying the fault patterns of rolling element bearings. To classify the nonlinear mixed data containing numerical and categorical values, Li et al. [13] developed an improved radial basis function based ELM method by fusing the data process into the ELM classification. However, in the ELM's nonlinear conversion procedure, the number of optimal hidden nodes is empirically selected [14]. The number of optimal hidden nodes is a critical parameter in the ELM model because it affects the ELM's performance greatly. Therefore, the problem of using the clumsy approaches to choose the optimal number of hidden nodes should be avoided [15], [16].
The goal of the ELM is to perform the supervised mission, which immensely restrains its applicability. Nevertheless, in actual application, label data is time consuming and expensive to obtain for fully supervised learning, while the massive unlabeled data are easy to achieve. To enable the ELM to have the ability of utilizing unlabeled data, some semi-supervised and unsupervised versions are discussed. With the help of a modified loss function, Luo et al. [17] suggested a new semi-supervised ELM (SELM) to suppress the bad influences of outliers in labeled and unlabeled datasets. Huang et al. [18] discussed a distributed SELM to handle the shortcomings in the time-varying communication network. By combining with the random vector functional link networks, Peng et al. [19] developed a joint optimization framework based extension of SELM to use both labeled and unlabeled samples. To improve the effectiveness in disposing non-Gaussian noises, Yang et al. [20] proposed a novel SELM method based on robust regularized correntropy criterion. By means of integrating Laplacian regularization to learn the manifold structure of hole image samples, Lei et al. [21] further discussed a modified version of the SELM to classify the superheat degree.
To exploit the low dimensional features of the unlabeled data, many unsupervised ELM (UELM) extensions are proposed. In order to consider the local connectivity during graph learning, Zeng et al. [22] developed an adaptive localityconstrained clustering based the UELM for unsupervised learning and clustering. Huang et al. [23] came up with a modified UELM model for the clustering task by combining the manifold framework with the UELM algorithm. To explore the data structure much better during clustering, Peng et al. [24] suggested a discriminative UELM based scheme to make use of the adjacency intrinsic structure and global discriminative information of the measurements. To perform the cluster task in process data, Chen et al. [25] integrated UELM with L2,1 norm regularization to remove the useless hidden nodes. For the purpose of detecting abnormality in video object trajectories, Sekh et al. [26] combined dynamic ELM with hierarchical temporal memory together in an unsupervised way. To tackle the issue of big data, Yara and Mariette [27] suggested three improved UELM algorithms utilizing a distributed framework to learn clustering models from big data. The research works have found that the standard UELM algorithm are related to the Laplacian eigenmaps algorithm because they both first construct an affinity matrix and then utilize the spectral technique to accomplish embedding or clustering task.
The basic idea of the conventional UELM is to guarantee: in the input space, the closer the two data points are; in the output space, the more similar the two data points' predictions are. According to the principle, the conventional UELM model can be viewed as a local adjacency similarity structure analysis technique because the UELM only explores the inner relationships among different data points in the input space. Therefore, the conventional UELM algorithm only pays close attention to the detailed adjacency intrinsic structure instead of preserving the intra-class diversity or variations in a dataset. However, the omitted intra-class diversity in the UELM model is also critical for fault detection. In addition, to guarantee the efficient fault detection performance, the optimal number of UELM hidden nodes needs to be chosen, which is a troublesome and intractable task using the existing parameter selection approaches.
On the basis of the aforementioned analysis, a new monitoring approach using an enhanced UELM (EUELM) model is developed to detect the nonlinear process fault in this paper. The proposed EUELM model is constructed by integrating the intra-class diversity analysis into standard UELM model to maintain the intra-class variations of original input data. Besides, the kernel trick is introduced into the UELM to cope with the difficult problem of setting up the number of hidden nodes. The goal of the EUELM model is to preserve both the intra-class diversity information and local adjacency similarity structure of the original input data. Based on the kernel trick, the objective function EUELM is transformed into a generalized eigenvalue decomposition problem. To monitor the extracted data features using EUELM model, the k-nearest neighbor (KNN) principle is applied to establish a fault detection statistic using the low dimensional features' local neighborhoods. The experiments and comparisons on the Tennessee Eastman (TE) process proves the superior fault detection effect of the suggested EUELM based approach.
Our work has three main contributions, which are elaborated as follows.
(1) To extract the intra-class variations and local adjacency similarity structure of the input data, the intra-class diversity analysis is infused with the traditional UELM model, which is beneficial to improve the process monitoring effect.
(2) To figure out the challenging trouble of selecting the optimal number of UELM hidden nodes explicitly, kernel trick is employed to dispose of the nonlinear property of the input data.
(3) The KNN rule is employed to establish a fault detection statistic based on the derived low dimensional features.
The remaining parts of this paper are as follows. In Section 2, the original ELM and standard UELM are reviewed briefly. The proposed EUELM model is presented in Section 3 in detail. Section 4 presents the construction of the monitoring statistic using the KNN rule. Section 5 givens an EUELM model based nonlinear process monitoring strategy. The experiments and comparisons on a numerical nonlinear system and the benchmark TE process are carried out in Section 6. At last, the conclusion is made in Section 7.

II. THE ELM AND UELM ALGORITHMS
The ELM and UELM are closely related to the proposed EUELM algorithm, so they are first reviewed to facilitate introducing EUELM model.

A. THE ELM ALGORITHM
The basic idea of original ELM [23], [28] is to calculate the output weights by using random feature mapping in matrix operations. Given N training samples {X, , where x i ∈ R n×1 is a data point and y i ∈ R n o ×1 is a binary vector, n and n o are respectively the dimensions of input layer and output layer. For y i , only one entry corresponding to the category of x i equals to one. The formulation of the ELM is given as where L and G(w i · x + b i ) represent the number of hidden nodes and the activation function, respectively. w i = [w i1 , w i1 , · · · , w in ] indicates the input weight while β i ∈ R n o ×1 indicates the output weight. b i denotes the bias of the i-th hidden node, while o j ∈ R n o ×1 is the output vector. Given the number of hidden nodes L, Eq. (1) can be reformulated as where β and O are defined as β = [β 1 , Given the target matrix Y = [y 1 , y 2 , · · · , y N ] T , ELM model is designed to minimize the training error O − Y 2 [9]. In this regard, the ELM model learns the training samples without the residuals. Thus, there exists β such that At the beginning of the learning task in the original ELM, the hidden layer bias b i and the input weights w i are determined according to an uniform probability distribution on [−1, 1] randomly, for i = 1, 2, · · · , L. After the parameters w i and b i are selected, the β is computed as where the H † is an inverse of the H, which is established by figuring out the single value decomposition or least-squares. In order to make the network possess good generalization property, the ELM trained network [23] is designed to obtain both the smallest training error and output weights norm [23], [29].
Minimize Hβ − Y 2 and β 2 On the basis of Eq. (6), the expression of ELM model is given as where e i denotes the i-th training sample's error vector and C indicates the penalty factor.

B. THE UELM ALGORITHM
The main objective of UELM is to guarantee that the probabilities P( y| x i ) and P( y| x j ) of adjacency input samples x i and x j should be also similar in the output space [18]. To enforce this goal, the following optimization objective is adopted.
where w ij is designed to put on a large punishment if big variation exists in the values of P( y| x i ) and P( y| x j ). The weight parameter w ij has the ability of representing the neighborhood relations of different samples in the original input space. The value of w ij is set to be nonzero if the x i or x j is in the k nearest adjacencies of the x i or x j , respectively.
The nonzero value can be calculated by virtue of utilizing kernel function exp(− x i − x j 2 /t), or set to be 1. Therefore, a sparse symmetric matrix W = [w ij ] ∈ R N ×N is established to represent the local adjacency similarity structure of the original input dataset. In order to avert calculating the conditional probability, Eq. (8) is further approximated as whereŷ i andŷ j respectively are the predictions of the input samples x i and x j . The symbol Tr(·) indicates matrix trace computation andŶ denotes the prediction matrix. L = D−W represents the Laplacian matrix and the element of diagonal By incorporating the manifold regularization to utilize unlabeled data, the optimization of the conventional UELM model is built as where λ indicates the tradeoff parameter and Laplacian matrix L is estimated from the unlabeled training dataset. F represents the output matrix whose i-th row equals to f i .

III. THE ENHANCED UELM MODEL
On the basis of Eq. (10), the optimization of standard UELM is indeed to guarantee that the corresponding network outputs f i and f j of the two neighboring input samples x i and x j are also near neighbors. Nevertheless, standard UELM model has no explicit constraint condition for distant samples in original input space. This would lead to take no account of significant variance information of original process data in the UELM model, and the faraway input samples are inclined to be projected to a small adjacent area in the output space. At this point, the UELM model is thought to be a local structure preserving algorithm, while it has no capability of mining the important global structure information of original input data. In some cases, it will project all the original input samples to one point in the output space, which makes the derived output weights overfit to the training samples. In addition, it is an intractable and troublesome task to set up the optimal number of UELM hidden nodes. Motivated by the above analysis, an enhanced UELM (EUELM) algorithm is introduced to improve the nonlinear process fault detection effectiveness by modifying the UELM optimization with the intra-class diversity analysis technique. The primary target of EUELM is to ensure that the neighboring input samples should be projected to a small adjacent area in the latent space, while the faraway input samples should be projected to be still distinct from each other. Moreover, in order to avoid explicitly setting up the optimal hidden nodes number, kernel trick is employed to the EUELM model, for the purpose of handling the nonlinearity of the original input data.

A. MODIFY THE UELM BY INTEGRATING THE INTRA-CLASS DIVERSITY INFORMATION
It is widely recognized that the intra-class diversity information (i.e., the intra-class variations) of the original process data [30]- [32] also contributes to achieve more efficient fault detection performance. To preserve the diversity information of input data points, a diversity graph is first defined. Then, its affinity matrix is constructed by considering the intra-class variations of data points in the diversity graph. Finally, the diversity information is efficiently maintained by maximizing the diversity scatter calculated from the diversity graph.
Given the normalized training data X = [x 1 , x 2 , · · · , x N ], the diversity graph [30]- [32] is constructed as G d = (X, E, Q), where E represents the set of edges connecting different data points and Q denotes the affinity matrix with the elements characterizing the diversity of two different data points x i and x j . From the view of statistic, an element q ij in the affinity matrix Q can be defined as follows where q ij measures the contribution of data point x i associated with data point x j to the diversity information.
To mine the data's diversity in the network outputs of the UELM, the objective function of preserving data diversity information is established as where f i = h(x i )β and f j = h(x j )β respectively indicate the UELM network outputs corresponding to the data points x i and x j . If two data points x i and x j are far apart in the input space, while the corresponding network output points f i and f j are close to each other, then Eq. (12) will incur a heavy penalty by utilizing the defined weight q ij . Therefore, maximizing Eq. (12) is intended to guarantee that if the diversity of the original input samples x i and x j is large, the diversity of the corresponding network output points f i and f j will be also large. From the perspective of statistic, the optimization in Eq. (12) enables UELM model to preserve the most input data diversity information during computing the network outputs.
The network outputs f i = h(x i )β and f j = h(x j )β are substituted into the optimization of preserving diversity defined in Eq. (12), and the objective function is reformulated VOLUME 9, 2021 as follows where Tr(·) indicates the matrix trace, S denotes a diagonal matrix and S ii is calculated as S ii = = min Tr β T (I L + λH T LH)β (14) where I L ∈ R L×L denotes an identity matrix. As previously mentioned, the UELM model only pays close attention to the local adjacency similarity of the input samples, because the UELM merely keep up the data points' local neighborhood relationships. However, it is known to all that the intra-class diversity depicts the external shape of the input dataset, while the local adjacency similarity retains the input dataset's internal organization [30], [32]- [34]. Therefore, to further increase the nonlinear process monitoring effect, it is very necessary to maintain the maximal data diversity information of faraway samples as well as to keep up the adjacency similarity structure of neighboring samples.
Motivated by this, the UELM model is modified by integrating data diversity information into its standard objective function. More specifically, the EUELM optimization is constructed by virtue of maximizing the data diversity objective function J D (β) as well as minimizing the standard UELM objective function J L ( β).
To reduce the computation complexity and increase the model stability, we only consider the case of N < L when computing β in our work [28]. The case would lead β to be infinite solutions. For the purpose of handling this troublesome issue, β is restrained to be figured out as where A ∈ R N ×n o represents the loading matrix. Then, Eq. (15) is further formulated as By figuring out the following Eq. (18), the loading matrix A is composed of the eigenvectors α 1 , α 2 , · · · , α n o corresponding to the first n o smallest eigenvalues γ 1 , γ 2 , · · · , γ n o .
Notice that H is of full row rank because of N < L, therefore HH T is invertible. We can further get where I N ∈ R N ×N represents the identity matrix.

B. EMPLOY THE KERNEL TRICK TO THE MODIFIED UELM MODEL
To figure out the issue of explicitly setting up the optimal modified UELM's hidden nodes number, kernel trick [35], [36] is employed. By using the kernel function k(x i , x j ) = h(x i ), h(x j ) , the kernel matrix K of the proposed EUELM model is defined as follows where i, j = 1, 2, · · · , N . In our work, the Gaussian kernel is chosen as the kernel function [35]- [37].
where the kernel parameter σ is set up beforehand. Then Eq. (19) can be expressed as After resolving Eq. (22), the α 1 , α 2 , · · · , α n o eigenvectors corresponding to the n o smallest eigenvalues is achieved. At last, the value of β is computed as whereα i = α i / HH T α i = α i / Kα i , i = 1, 2, · · · , n o are the normalized eigenvectors and A = [α 1 ,α 2 , · · · ,α n o ]. The low dimensional features of the given matrix X is computed as the output matrix T ∈ R N ×n o of the EUELM model.
Given a test sample x t , the projection vector t t is acquired by figuring out the output vector of the EUELM model.
where k t = h(x t )H T ∈ R 1×N indicates the kernel vector, and k t,i is computed as k t,i = k(x t , x i ) for i = 1, 2, · · · N .
To guarantee N i=1 h(x i ) = 0, mean centered kernel matrix K needs to be calculated before solving Eq. (22) and Eq. (24).
where all the elements of the N × N matrix I K are equal to 1/N . Before calculating the vector t t based on Eq. (25), the mean centered test kernel vectork t also needs to be computed.

IV. FAULT DETECTION STATISTIC CONSTRUCTION BASED ON THE K-NN PRINCIPLE
After the low dimensional feature information is extracted by the EUELM model, the k-nearest neighbor (KNN) principle [38], [39] is employed to build the fault detection statistic. The basic idea of estimating the monitoring statistic utilizing the KNN principle is that a normal data point's behavior is similar to the behaviors of the training data points; while the fault data point's behavior would reveal the abnormal deviation from the behaviors of the training data points [40], [41]. That is to say, a fault data point's distances to the k nearest adjacency training data points are much bigger than that of a normal data point to the k nearest adjacency training data points. Therefore, in our work, the data point's average distances to the k nearest adjacency training data points are calculated as the fault detection statistic. After the confidence limit of the constructed monitoring statistic is determined, the test data point is thought to be normal if the average distance to its k nearest adjacency training data points is smaller than the confidence limit. Otherwise, a fault is detected.
For the given output dataset T = [t 1 , t 2 , · · · , t N ] of the EUELM model, the k nearest adjacency data points for each vector t i are selected in the output dataset by using the Euclidean distance.
where d 2 i j indicates the Euclidean distance between the i-th vector t i to its j-th nearest adjacency in the output dataset. The data points owning the first k smallest Euclidean distances are selected as the k nearest adjacencies of the vector t i .
Then, the average square distance D 2 i is computed as the fault detection statistic.
To judge the status of the test data point x t , the kernel density estimation (KDE) technique [42]- [44] is applied to estimate the threshold value D 2 α of the monitoring statistic D 2 i according to the output dataset T. For a test data point x t , the k nearest neighbors of its projection vector t t is also found using the following equation in the output dataset.
Similarly, the data point owning the first k smallest values of d t,j are regarded as the k nearest adjacencies of the vector t t .
Then, the fault detection statistic D 2 t of the test sample x t is calculated as Finally, the monitoring statistic D 2 t is compared with its corresponding confidence limit D 2 α . If D 2 t > D 2 α , the test sample x t is considered as a fault sample; otherwise, the test sample x t is thought to be normal.

V. FAULT DETECTION STRATEGY BASED ON THE EUELM MODEL
As illustrated in Fig. 1, the EUELM based monitoring approach has the off-line modelling phase and the on-line detection phase. During the former phase, the training data is used to build the EUELM model and the threshold value D 2 α of the fault detection statistic is estimated by applying the K-NN principle to quantize the similarity between each sample and the training dataset. During the latter phase, the fault detection statistic D 2 t of the test sample is calculated to judge whether the fault occurs. The detailed pseudo code of the EUELM based monitoring scheme is summarized in Table 1.

VI. CASE STUDIES
In our work, two case studies are adopted to estimate the fault detection performance of the proposed EUELM based approach. One case study is a numerical nonlinear system, and the other one is the Tennessee Eastman (TE) process which is a well-known nonlinear process. The performance comparisons with other related methods are further conducted to testify the superior fault detection capability of the EUELM based approach.

A. CASE STUDY ON A NUMERICAL NONLINEAR SYSTEM 1) PROCESS DESCRIPTION
A numerical nonlinear system involving three process variables is first adopted to testify the effect of the EUELM based fault detection approach. The utilized numerical nonlinear system formulated in Eq. (32) is an improved version of one discussed in the literatures [45] and [46].
where e 1 , e 2 , e 3 ∈ N (0, 0.01) represent the noises, and t indicates the random variable sampling from [0,2]. In this numerical nonlinear system, the monitored variables are the output variables [x 1 , x 2 , x 3 ]. Based on Eq. (32), 300 normal samples are produced to establish the training dataset. A fault is introduced to generate the test dataset possessing 300 samples. The introduced fault is set up as: variable x 1 is increased from the 51-th sample by adding 0.05 × (k − 50) to its previous value until the end of the simulation.

2) COMPARATIVE METHODS AND PARAMETER SETTING
In this study, the monitoring feasibility and effect of the EUELM is contrasted with the traditional UELM and the KPCA. To be fair, for the standard UELM model, the KNN principle is also employed to the output dataset to construct the fault detection statistic. For the EUELM, the kernel function is chosen as the Gaussian kernel [35]- [37]. The kernel parameter σ and the output space dimension n o are respectively set up as 250 and 50 according to the grid search algorithm [47], [48] by seeking the optimal fault detection result of the training dataset. The nearest neighbor number k of the KNN rule is empirically set to be 6, and the tradeoff parameter λ is determined as 0.3 with the help of the grid search. The values of the used parameters in the EUELM based fault detection method are given in Table 2. For the sake of fairness, the Gaussian kernel is also employed to the KPCA, and the kernel function is determined as 250 as well. For the UELM model, the dimension of output space n o , the nearest neighbors number k and the tradeoff parameter λ are also respectively selected as 50, 6 and 0.3. Furthermore, the number of hidden nodes L is set up as 1400 and the activation function is the Sigmoid function. For all the three methods, the principal components possessing 95% variance of the training dataset are retained,

3) FAULT DETECTION EFFECT COMPARISON
The fault detection charts of KPCA, UELM and EUELM for the ramp fault are illustrated in Fig. 2. Fig. 2(a) shows that the KPCA T 2 statistic detects the process fault at the 74-th sample and its SPE statistic discovers the process fault at the 78-th sample. Compared with the KPCA, the UELM gains a better monitoring result given in Fig. 2(b), where its D 2 statistics gives an alarm of the fault at the 65-th sample with the 94.31% fault detection rate. As illustrated in Fig. 2(c), the EUELM D 2 statistic warns of the fault at the 56-th sample. Therefore, the EUELM is the most sensitive approach to the ramp fault among these three fault detection approaches. Table 3 lists the fault detection times and fault detection rates of the KPCA, the UELM and the EUELM for the simulated ramp fault. According to Table 3, the EUELM D 2 statistic has the highest fault detection rate, i.e., 98.00%. Besides, the fault detection times and fault detection rates of the EUELM, UELM and KPCA for the ramp fault are respectively visualized in Fig. 3 and Fig. 4 for a more intuitive comparison. To summarize, the experiments and comparisons on the numerical nonlinear system demonstrates the superior fault detection performance of the EUELM based approach over the KPCA and UELM based approaches.

B. CASE STUDY ON THE TE PROCESS 1) PROCESS DESCRIPTION
The TE process is a benchmark to compare various fault detection approaches [49]- [51]. The TE process is set up   on the basis of a plant-wide industrial chemical operation model given in Fig. 5, which is composed of reactor, stripper, separator, condenser, and compressor. According to the references [51] and [52], 52 important variables are selected as the monitoring variables. A TE simulator can be found in the website: http://brahms.scs.uiuc.edu, which permits one normal operating mode and 21 fault patterns. Note that the fault patterns IDV(3), IDV(9), IDV(15) and IDV (19) have been  [51], [52]. already proved to be difficultly detected by the data-driven based fault detection approaches because theses fault datasets have no observable changes in the means or the variances [50], [52]. Therefore, except for these four fault patterns, the rest of the seventeen fault patterns given in Table 4 are utilized to testify the monitoring capability of the EUELM based scheme in our work. The TE simulator generates 960 samples for the normal operating mode and each introduced fault pattern. At the 160-th sample, all the seventeen faults are added to the TE process. More details about the introduction of the TE process can refer to the literature [52].

2) COMPARATIVE METHODS AND PARAMETER SETTING
In our work, the fault detection performance of the EUELM is also contrasted with the UELM and KPCA. To be fair, the KNN principle is adopted to the output dataset of the standard UELM model to establish the fault detection statistic. For the EUELM based scheme, the Gaussian kernel is selected as the kernel function according to the refernces [35], [36] and [37]. The kernel parameter σ and the output space dimension n o in the EUELM are respectively set to be 600 and 40 based on the grid search algorithm [47], [48]. The nearest neighbor number k of the KNN rule used in the EUELM is empirically chosen to be 4, and the tradeoff parameter λ is determined as 0.1 using the grid search. The values of the used parameters in the EUELM based method for the TE process are given in Table 5. To be fair, the Gaussian kernel is also utilized in the KPCA based method, and the kernel function is chosen as 600 as well. In the UELM based approach, the dimension of output space n o , the nearest neighbors number k and the tradeoff parameter λ are also set to be 40, 4 and 0.1, respectively. Moreover, the number of hidden nodes L is set up as 1000 in the UELM model and the activation function is the Sigmoid function. For all the three monitoring methods, the principal components possessing 95% variance of the training dataset are retained, and the threshold values are decided in line with the 99% confidence level. A fault is alarmed if consecutive 5 samples go beyond the corresponding threshold value and the fault detection time is determined as the first sample number of them. The fault detection rate is computed as the percentage of the detected fault samples in the overall real fault samples.

3) FAULT DETECTION EFFECT COMPARISON
The monitoring results for the faults IDV(8), IDV (13) and IDV (17) are utilized to confirm the superior monitoring effect of the EUELM based method. The monitoring charts of the EUELM, UELM and KPCA based approaches for the fault IDV(8) are shown in Fig. 6. From the fault detection charts of the KPCA shown in Fig. 6(a), we can see that both the T 2 and SPE statistics detect the fault at the 189-th sample. However, these two statistics both have a low fault detection rates because a lot of fault samples go down below the corresponding threshold value after the 189-th sample. From Fig. 6(b), the D 2 statistic of the UELM gives better fault detection result compared with that of the KPCA. According to Fig. 6(b), the UELM D 2 statistic discovers the fault at the 183-th sample and much fewer fault samples go down the confidence limit after the fault is detected. However, the ULEM has the false alarming samples under the normal operating status. Through comparing the monitoring results of the three approaches, the EUELM achieves the best fault detection effect illustrated in Fig. 6(c). From Fig. 6(c), the D 2 statistic exceeds its threshold at the 176-th sample with no missing alerted fault samples, which reveals that the EUELM has the earliest and the most accurate fault detection results. Based on these monitoring results, the EUELM based approach is the most effective one for detecting the fault IDV(8) among the three fault detection approaches.
The monitoring charts obtained by the three monitoring approaches for the fault IDV(13) which is the slow shift in reaction kinetics are plotted in Fig. 7. In Fig. 7(a), the T 2 and SPE statistics of the KPCA respectively give an alarm of the fault IDV(13) at the 209-th sample and the 215-th sample. However, these two statistics owns low fault detection rates because many real fault samples are wrongly treated as the normal samples without giving fault alarms. As shown in Fig. 7(b), the UELM D 2 statistic alarms the fault IDV (13) at the 212-th sample. Whereas, the fault detection rate of the UELM is still needed to be improved because many fault samples also go down the threshold value after the 212-th sample. On the contrary, the EUELM results in the best fault detection performance in Fig. 7(c), where its D 2 statistic alerts the fault at the 197-th sample with the highest fault detection rate. The monitoring results of fault IDV (13) illustrate that the EUELM detects fault IDV(13) much faster and more accurate than the KPCA and UELM. Fig. 8 illustrates the fault detection charts of the three methods for fault IDV (17). From Fig. 8(a), the KPCA has the worst monitoring performance, where its T 2 statistic discovers the fault at the 191-th sample while its SPE statistic warns the fault at the 189-th sample. Besides, both the T 2 and SPE statistics result in much lower fault detection rates because more fault samples fluctuate around the confidence limit after the fault is detected. Compared with the KPCA, the UELM brings about a slight improvement in terms of the monitoring performance in Fig. 8(b). According to Fig. 8(b), the UELM's D 2 statistic exceeds the confidence limit at the 188-th sample. In addition, because of fewer missing alarmed fault samples, the D 2 statistic acquire a slight higher fault detection rate. In contrast to the results of the KPCA and UELM, the EUELM D 2 statistic given in Fig. 8(c) reacts  the most quickly to fault IDV(17) because it goes beyond the threshold value after the 182-th sample with no missing alarmed fault samples. Besides, the EUELM achieves the highest fault detection rate, i.e., 94.47%. This again demonstrates the superior fault detection effect of the EUELM based method over the UELM and KPCA based methods.
As listed in Table 6 and Table 7, the process monitoring performance of the EUELM, UELM and KPCA based approaches for all the seventeen fault patterns are investigated. According to the Table 6, we find the UELM owns much earlier fault detection times for the faults IDV(1), IDV(2), IDV(5), IDV(6), IDV(8), IDV(10), IDV(13) and  IDV(16) ∼ IDV(21) than that of the KPCA. For the rest of the faults apart from the faults IDV(7) and IDV (14), both the UELM and KPCA based methods have similar fault detection times. However, the monitoring capability of the UELM is still not satistified. On the contrary, the EUELM based method gives out an improved fault detection performance. To be specific, the EUELM achieves the earliest fault detection times among these three methods for the thirteen fault patterns except for the faults IDV(10), IDV(16), IDV (18) and IDV (20). In the light of Table 7, the UELM and KPCA have the similar fault detection rates for the faults IDV(1), IDV(2) and IDV (6). Nevertheless, the UELM reveals much higher fault detection rates than that of the KPCA for the rest of the faults apart from the faults IDV(4), IDV (7), IDV(11) ∼ IDV(13), IDV (14) and IDV (18). By further comparing the monitoring results of these three methods again, the EUELM gains the highest fault detection rates for all the seventeen fault pattens. To make a more intuitive comparison, the average fault detection times and average fault detection rates of the EUELM, UELM and KPCA over the seventeen fault patterns are visualized in Fig. 9 and Fig. 10, respectively. As illustrated in Fig. 6 and Fig. 7, the EUELM achieves the earliest average fault detection time and the highest average fault detection rate among the three methods. In brief, the comprehensive and visualized comparisons revealed in Table 6, Table 7, Fig. 9 and Fig.10 verify the excellent fault detection effect of the EUELM over the UELM and KPCA.

VII. CONCLUSION
A novel enhanced UELM based monitoring scheme is proposed to detect the nonlinear process fault in this paper. Our work has three main contributions. Firstly, to preserve both the intra-class variations and the local adjacency similarity structure of the input dataset, an improved UELM algorithm is put forward by uniting the intra-class diversity analysis technique with the conventional UELM model. Secondly, to tackle the difficult trouble of figuring out the optimal number of hadden nodes, the improved UELM model is further enhanced by applying the kernel trick to mine the data nonlinear characteristic. Thirdly, when the intra-class diversity and local adjacency similarity information of the mensurements is exploited using the proposed EUELM model, the KNN rule is employed to build a fault detection statistic. Through the detailed comparisons with the traditional KPCA and closely related UELM based monitoring methods, the experimental results obtained from a numerical nonlinear system and the benchmark TE process clearly testify the superior nonlinear fault detection effect of the suggested EUELM based scheme, in terms of the fault detection time and fault detection rate.