A Novel Process Monitoring Method Based on Dynamic Related ReliefF-SFA Method

This paper proposes a condition monitoring method based on Dynamic Related ReliefF-SFA (DRRSFA), which solves the problem of huge variable dimensions and strong autocorrelation in process monitoring. First, the samples are mapped into a new space, and the slow projection components (SPCs) of the samples are calculated. The SPCs not only considers the autocorrelation between the variables, but also characterizes the inherent properties of the whole system. Second, the feature selection algorithm ReliefF is used to select the more weighted features, which is called the principal components(PCs) to achieve dimensionality reduction. Then the corresponding statistics and control limits are calculated based on the obtained PCs. Finally, the process monitoring using the proposed algorithm is performed by testing a numerical example and the actual production process data, and the results show the effectiveness of the proposed method.


I. INTRODUCTION
With the development of science and technology, the scale and complexity of modern industrial systems is increasing rapidly. A large amount of sensor data generated by the industrial process is stored, and it is of great significance to effectively extract the important information of the system for process monitoring and improve the production efficiency and quality.
As the important researching branch of the data-driven process monitoring methods, multivariate statistical process monitoring (MSPM) has received extensive attention [1]- [4], which can extract feature information representing complex industrial data. Some traditional methods such as Principal Component Analysis (PCA) [5], [6], Partial Least Squares (PLS) [7], [8], Fisher Discriminant Analysis (FDA) and Canonical Variate Analysis (CVA) [9], [10] have been widely considered. As one of data-based fault diagnosis methods, Slow feature analysis [11] (SFA), as a new feature extraction and dimensionality reduction method, has been researched widely in the fields of process monitoring and fault diagnosis. It aims to extract the slowest-changing components from the time-varying signals [12], [13], which can characterize the The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Tedesco . most essential features and the inherent properties of the system. Therefore, it can not only detect the deviations from operation conditions by monitoring a steady distribution but also identify the process dynamic anomalies according to the temporal distribution.
In recent years, various improved SFA methods are proposed and applied to process monitoring and other fields such as blind source separation, human action recognition and remote sensing. As a relatively new feature extraction technique, various SFA-based monitoring methods have been developed to address different process characteristics. Zheng and Yan [14] combined SFA with higher order statistics such as kurtosis to extract more effective slow features, which can identify normal and fault status. Yan and other scholars [15] proposed a method that combines SFA with mutual information-based method to extract variables more related to faults, which is superior to traditional fault detection methods. Sun et al. [16] combined the deep network and SFA theory, which highlights the changed information and thus has a better detection performance.
These algorithms improve the fault detection performance of SFA, however, they cannot solve the problem of the high dimension and high correlation between data variables. Although such as dynamic SFA [17], dynamic PCA [18], and dynamic fisher discriminant analysis (DFDA) [19] consider the cross-correlation and auto-correlation between variables, but Kruger et al. [20] proved that the traditional dynamic method cannot completely eliminate the autocorrelation of process variables, especially in some cases with strong autocorrelation, the features obtained by using the traditional dynamic method still have autocorrelation and crosscorrelation question. Especially in industrial process, most variables have a strong correlation, hence it is still difficult to apply them to fault diagnosis of the real systems with strong autocorrelation.
In this paper, we try to propose a novel method called Dynamic Related ReliefF-SFA (DRRSFA), to solve the problem of the ''curse of dimensionality and correlation'' [21] mentioned above. Like PCA, we select PCs of the transformed data to realize the dimensionality reduction. The transformed data not only extracts the slow features of the original, but takes the problem of high correlation into account. The DRRSFA method is adopted to obtain the contribution weights of the corresponding slow projection components (SPCs), and then, the SPC in which its weight exceeds the set threshold τ is selected as the principal component, which considers the correlation between historical and future datasets, improving the fault detection capability of the fault detection model. According to the known fault type, the fault models are established. Then T 2 and T 2 e statistics of all faults are calculated. The fault is detected if the real-time data exceeds the control limit.
The remainder of this paper is organized as follows: In section 2, the SFA algorithm and ReliefF algorithm are briefly reviewed. In Section 3, the DRRSFA method is introduced in detail. In Section 4, the process monitoring method based on DRRSFA is proposed. In Section 5, two simulation examples are used to demonstrate the performance of the proposed method: a numerical example and the heat treatment furnace industrial process. Conclusions are given in section 6.

II. PRELIMINARY KNOWLEDGE
A. THE SFA ALGORITHM Mathematically, given a temporal input signal [11] x SFA aims at finding a set of feature function each component of which is a weighted sum over a set of K nonlinear functions h k (x), usually K > max(I , J ), which is represented as: Applying h = [h 1 , . . . h K ] T [22] to the input signal yields the nonlinearly expanded signal z(t) := h(x(t)). The output y (t) = [y 1 (t) y 2 (t) . . . y J (t)] T with y (t) := g (x (t)) varies as slowly as possible and the optimization problem can be expressed as s.t. < y j >= 0, < y 2 j >= 1 ∀i < j :< y i y j >= 0 (5) where < · > andẏ represent mean and the derivative of y concerning to time, respectively. For linear SFA, each y j (t) is a linear combination of all input data where w j = [w j1 , . . . w jK ] T is the weight vector. Equation (4) is to optimize the input-output function and thus the weights make (y j ) =<ẏ 2 j >= w T j <żż T > w j is minimal. Given that x (t) is the original input data and R [15] is the covariance matrix of x (t),then where Q is the whitening matrix. Apply PCA to matrix <żż T >.
<żż T >= P T P where P = [p 1 , p 2 , . . . p J ] is the eigenvector matrix, which is a set of standard orthogonal weight vectors, and So the output signal y(t) satisfies: For fault monitoring on a new sample vector x, T 2 and T 2 e which are the statistics of the main and residual space are calculated as: T 2 e = y T e y e (13) where J and e are the variable dimensions in the main and residual space.

B. THE RELIEFF ALGORITHM
Relief algorithm [23], [24] is initially limited to the classification of two types of data while ReliefF algorithm [25] which is later extended can solve the multi-class problem [26]. ReliefF algorithm assigns different weights of features according to the correlation of each feature and class. The larger the weight of the feature, the stronger the classification ability of the feature is.
ReliefF algorithm is to estimate the weight of features according to how well their values distinguish between samples that are near to each other. For that purpose, ReliefF randomly selects a sample R i , then searches for k of its nearest neighbors from the same class, called nearest hits H j , and also k nearest neighbors from each of the different classes, called nearest misses M j (C). It updates the weight estimation W (A) for all features A depending on their values for R i , hits H j and misses M j (C). The contribution for each class of the misses is weighted with the prior probability of that class p(C) (estimated from the training set). Since we want the contributions of hits and misses in each step to be in [0, 1] and also symmetric, we have to ensure that misses' probability weights sum to 1. As the class of hits is missing in the sum we have to divide each probability weight with factor 1 − p(class(R i )), in which p(class(R i )) represents the prior probabilities for the same classes of R i and 1 − p(class(R i )) represents the sum of probabilities for the misses' classes. The process is repeated for f times.
Selection of k hits and misses is the basic difference to Relief and ensures greater robustness of the algorithm concerning noise. User-defined parameter k controls the locality of the estimates. For most purposes it can be safely set to 10.
Function diff (A, R 1 , R 2 ) calculates the difference between the values of the feature A for two samples R 1 and R 2 . For nominal features it was defined as: and for numerical features as: The function diff (·) is used also for calculating the distance between samples to find the nearest neighbors. The total distance is simply the sum of distances over all features (Manhattan distance). [27]  The specific steps of ReliefF algorithm are shown in Table 1 and described as the pseudo code.

III. DYNAMIC RELATED RELIEFF SLOW FEATURE ANALYSIS
In this section, a novel dynamic related ReliefF-SFA (DRRSFA) method is proposed in detail. The method mainly consists of two parts. One is the data projection transformation, which solves the problem of high correlation between variables, and extracts the slow features to get the essential characteristics of the sample. The data obtained in the new space not only considers the auto-correlation of the data, but also represents the inherent attributes of the whole system.
The second part is the PCs selection. Because a time delay [28] is introduced in the first part, the original space increase several times. We adopt the ReliefF algorithm to select the slow features sensitive to fault identification, establish the relevant main and residual subspace [29], and achieve dimensional reduction.
And the Relief-SFA and SFA algorithm are selected to compare with the proposed method. The sensitivity of the three methods is summarized in detail. The traditional SFA can get the slow features which represent the inherent characteristics by a data projection, however, it cannot reduce the dimensionality. ReliefF-SFA method has made an improvement based on SFA. The ReliefF is embedded into SFA to obtain the different weights of each feature, and the more weighted variables (whose weight exceeds τ ) are selected as the PCs, which realizes the dimension reduced.
Compared with these two methods, the proposed DRRSFA not only considers the inherent features and the autocorrelation of the variables, but also selects the PCs to reduce redundancy. It first performs a data projection transformation, each row of which is called SPCs. Then the ReliefF is used to obtain the weights of the SPCs, which its weights exceed τ are selected as PCs. The specific steps are described in section A and B.

A. DATA PROJECTION TRANSFORMATION ANALYSIS
In this section, we realize projection transformation of the data. The specific steps of the mapping algorithm are as follows: (1) Select samples X * ∈ R m * n , where m and n denote measured variables and the number of samples, respectively. Then, standardize X * to get X , as follows.
whereX ∈ R m is row mean vector of X ; the diagonal matrix ∈ R m×m denotes the standard deviation for each row of X ; (2) Let x(k) be the sample dataset at time k, represent the past and future information of X as x p (k) and x f (k), respectively. The expressions are given below: . . .
, 2020 (20) where q represents the time delay, which is determined by the order of the system. mq represents the number of transformed variables.
(3) Normalize x p (k) and x f (k) to be x p (k) and x f (k): whichx p andx f are as follows: each row of which is the mean of the correspondinḡ x p andx f . (4) Construct the past and future Hankel matrices of x p (k) and x f (k): (26) each of which contains N columns, indicating the measured variables are obtained by N times observations of the sample and the vector of the n th observation should satisfy x(n) = x(2q + N − 1).Take k = q + 1, X p and X f can be rewritten as: (5) Estimate the covariance and cross-covariance of the Hankle matrices of X p and X f according the followings: Introduce H = −1/2 ff fp −1/2 pp and make singular value decomposition to get H = U V T .
(6) Obtain the dynamic correlated matrix S When applied to actual online testing, since future observations of process variables are unknown, so only past measurements can be used to calculate dynamic correlated variables, the dynamic correlated matrix S can be obtained from: where mq and N denote the measured variables and the samples of the transformed data. (7) Whitening. Denote R is the covariance matrix of S, the singular value decomposition (SVD) is performed on matrix S.
where Q is the whitening matrix, and then the input data S undergo whitening process by using Equation.
(8) Slow feature extraction: Take principal component analysis of <żż T >, the eigenvectors corresponding to the first J minimum eigenvalues are the required normalized weight vectors: with P = [p 1 , p 2 , . . . p J ],thus, the extracted slow features (SFs) of S is calculated as: According to Equation (35), the original dataset X ∈ m * n can be transformed into each row of which is called the corresponding slow projection components (SPCs).

B. THE PCs SELECTION OF DRRSFA METHOD
In this section, ReliefF algorithm is adopted to obtain the contribution weights of the corresponding slow projection components (SPCs), and then, the SPC in which its weight exceeds the set threshold is selected as the principal component. The PCs selection calculating process is shown in Figure 1. In our method, when selecting PCs, all fault data is used to model in order to avoid some sensitive SPCs which generate large or small weights to affect the fault detection results. We select n 1 samples from the normal data set as normal data training samples to form D N , D N ∈ R n 1 ×m . And then we select n 2 samples from different fault data sets as different fault training samples to form D Fi , D Fi ∈ R n 2 ×m , i = 1, 2, . . . , q. The specific steps of influence weights calculation of DRRSFA methods are as follows: which its corresponding transformed variable is called the slow projection component (SPC).
(3) Use ReliefF algorithm to get the contribution weights of each SPC.
According to Equation (16), the contribution weights w used to select PCs can be calculated as follows: where s = 1, 2, . . . , mq represents different SPCs, H j represents the k nearest neighbors from the normal class D N , and M j (C) represent k nearest neighbors from each of the fault classes D F .
(4) Get the contribution weights matrix W of all SPCs and arrange the weights in descending order, then the SPCs whose weights exceed the threshold τ are selected as the principal components. Here, we use the average of all the mq weights as the threshold τ .

IV. PROCESS MONITORING BASED ON DRRSFA METHOD
In the proposed DRRSFA fault detection method, calculate the dynamic correlated variables by taking into account the auto-correlation characteristics of the data variables, and then the slow features of these variables are extracted. In this way, the original data is mapped into a new space through the projection transformation.
To enhance the effectiveness of fault detection, we use the ReliefF algorithm in the proposed method to acquire the weight matrix which is used to select the principal components instead of the traditional CPV criterion.
Finally, the corresponding statistics are established to monitor the operation status of the air-cushion furnace. The schematic of DRRSFA fault detection method is outlined in Figure 2. (2) Similar to step (2)-(8) in section III-A, obtained the corresponding transformed variable called the slow projection component (SPC).
(3) Select PCs using the ReliefF algorithm described in Section III-B.
(4) Determine the control limit by Equation (12) and (13) Online Monitoring: (1) Collect the i th i(i ≥ q) moment monitoring data, building the online sampling vector according Equation (19) and new standardized data y new is obtained.
(2) Perform a projection transformation analysis on the new sample.
(3) Calculate the T 2 and T 2 e statistics of the new sample by Equation (12) and (13). When T 2 > 1 − α, y is regarded as a fault sample, otherwise y belongs to a normal sample.

V. CASE STUDIES
In this section, the proposed DRRSFA fault detection method is applied to a numerical example and an actual production process data to evaluate its performance.
The fault detection rate (FDR) detection performance indicators is presented below, which represents the probability of VOLUME 8, 2020 fault detection after a fault occurred, where S F is the number of fault samples detected, S real is the number of samples that match real production conditions.

A. CASE STUDY OF A NUMERICAL EXAMPLE
Consider a general multivariate process [30] represented by where the measurement vector x ∈ R m has m variables, the coefficient matrix A ∈ R M ×r is assumed to be column full rank, s ∈ R r denotes r independent data sources (r < m), with each sample i.i.d., e ∈ R m denotes Gaussian white noises. An actual data model is generated by Equation (25) specially as follows: where s 1 , s 2 and s 3 is Gaussian independent random sequences with the mean [0.2,5.9,7.5] and unit standard deviation, respectively, and e represents white Gaussian noises with the zero mean and standard deviation [0.0069,0.0152,0.0235,0.0156,0.0165] T . Consider four kinds of incipient faults as follows: (1) Sensor constant deviation: x = x * + f (2) Sensor gain degradation: x = ηx * (3) Sensor accuracy degradation: x = x * + θ (4) Sensor accuracy degradation and process fault occur simultaneously x = x * + θ and s = s * γ . Through the above description, 1000 training data and 1000 test data are generated respectively. All the faults are introduced in the 501st sample index. For convenience, assume sensor faults occur in x 1 , where f = 0.05, η = 0.96, θ is normal distribution which its standard deviation is 0.06 and mean is zero. Besides, the process faults occur in s 1 and x 1 , where θ is normal distribution with the zero mean and standard deviation of 0.01, and γ = 0.5.
In this paper, SFA algorithm, ReliefF-SFA algorithm and DRRSFA algorithm are used for numerical simulation experiments. In SFA algorithm, the data dimensionality is not reduced, but just performed a projection transformation. In the ReliefF-SFA algorithm, the weight contribution is set to 0.85, and the number of principal component is 2(the number of total variables is 5).
In the DRRSFA algorithm, the confidence τ is set to 0.95, and the time delay q is 6. The number of principal components selected is 12(the number of expanded variables i.e. SPCs is 30). In the ReliefF-SFA algorithm, the confidence τ is set to 0.85, so the selected variables is only 2. The SPCs weights of DRRSFA and ReliefF-SFA algorithm is shown as follows: The fault detection results represented by the FDR is shown in Table 2. The maximum FDR has been marked in bold. It can be seen from Table 2 that the traditional SFA process monitoring model has a not bad FDR in the numerical simulation, but ReliefF-SFA performs a low FDR rate, which cannot be utilized for process monitoring in this case. However, the proposed DRRSFA fault detection model in all fault conditions performs a higher and more stable detection rate.
The traditional SFA extract the slowest-changing components by a projection transformation, which represents the essential features of the case, hence its FDR is not too bad. When ReliefF algorithm is employed to select principal components, the useful information is filtered for fault detection and only 2 variables are selected. We use the DRRSFA algorithm to model all faults by selecting principal components, and weight the SPCs to reduce the influence of redundant SPCs, which enhance the performance of fault detection. Figure 4 shows the detection performances of SFA, ReliefF-SFA and DRRSFA for the numerical example. It can be seen that the FDR in SFA and ReleiefF-SFA is lower than DWRPCA algorithm. Although SFA can detect the fault, it does not have a satisfying result in all faults. Figure 4.b shows a low FDR which is not considered in the pratical application.
In comparison, DWRPCA algorithm has a good performance for each fault detection.

B. CASE STUDY OF THE ACTUAL PRODUCTION PROCESS DATA
Heat treatment furnace [31] is an important equipment in industrial processes.
The source of the data set used in this section is the heat treatment furnace equipment of a processing plant. The field sensor data of about one month is collected through the production line from a factory, and the sampling interval is 100 milliseconds. The entire data set includes process variables such as the inlet temperature at different locations in the furnace, the current speed of the fan, and the temperature of the radiant tube. The parameter information of the heat treatment furnace is shown in Table 3.
According to the process of the heat treatment furnace, we selects 75000 samples from normal working conditions and takes 5s intervals to form a total of 1500 sets of normal data, which 1000 sets for modeling and 500 sets for testing. Three kinds of faults are considered as follows. Select 50,000 samples for each of the three fault data sets, and take them at 5s intervals to obtain 1,000 sets of data, respectively.
(1) the sensor measurement value shifting fault in the heating zone 1, (2) the sensor measurement value shifting fault in the heating zone 3, (3) the low heating power of the radiant tube in the heating zone 3.
In this paper, SFA algorithm, ReliefF-SFA algorithm and DRRSFA algorithm we proposed are used for heat treatment furnace experiments. The fault detection results represented by the FDR is showed in Table 4. The maximum FDR have been marked in bold.
From Table 4, it is obvious that the fault detection method based on the proposed DRRSFA has further increased the FDR of heat treatment furnace case. Especially for fault 2 and 3, both T 2 and T 2 e show a better performance, only fault 1 is less effective. In a word, the DRRSFA algorithm has higher sensitivity for fault detection.  In the establishment of the fault monitoring model based on the DRRSFA method, the dynamic process order determined by the Akaike Information Standard (AIC) is 6, so the time  delay q = 6 is selected of the heat treatment furnace process. The Hankel matrix can be constructed by Equation (27) and (28), where the number of columns N is 989 and the number of rows mq is 216. Then through the projection transformation, we obtain 216 slow features, which are put into ReliefF algorithm to get PCs. PCs are selected according to weights contribution exceeds the set threshold τ = 0.05. The result of PCs selection weights W is shown in Figure 5. Through screening, 35 principal components are selected to the main subspace, and the rest are put into the residual space. Now, fault 3 is taken as a case to illustrate. The fault detection rate of SFA algorithm on T 2 statistics is 7.56%, and even the better detection rate of T 2 statistics in ReliefF-SFA is only 8%, which cannot be used in actual fault detection due to its inefficiency. Figure 6 shows the detection performance of fault 3 in traditional SFA, ReliefF-SFA and DRRSFA algorithms on heat treatment furnace case, respectively. It can be seen clearly that the detection effect of the proposed algorithm is superior to that of traditional SFA, and ReliefF-SFA.
It can be concluded from Figure 6 that the first 50 samples of the real data are under normal working conditions, hence the statistics are under the control limit. After the 50th sampling point, when fault 3 is used as the input, both the detection rate of the SFA and ReliefF-SFA methods are low.
That is, both the SFA and ReliefF-SFA algorithms cannot detect the occurrence of the fault, and the system is mistaken a normal operating state by the wrong judgment. The T 2 and T 2 e of DRRSFA method has a detection rate of 96.89% and 98.89%. Compared with the corresponding two algorithms, the DRRSFA algorithm has a higher fault detection rate when detecting fault 3 and achieves better monitoring results.

VI. CONCLUSION
In this paper, a fault detection method based on DRRSFA is proposed: the traditional SFA is improved in both the projection transformation and principal components selection. It not only considers the autocorrelation characteristics between variables but also realizes a dimensionality reduction. A projection mapping analysis is adopted, which contains two steps: first, implement dynamic correlation analysis VOLUME 8, 2020 to eliminate autocorrelation between variables, which considers both the past and future data; second, the slow features are extracted from the transformed data. Due to the consideration of the autocorrelation of the data, the dimensions of the original data have increased several times. The ReliefF algorithm is used to select slow features that are sensitive to fault identification to achieve dimensionality reduction.
In addition, it is proved that the proposed DRRSFA method has good performance in process monitoring through a numerical simulation and the actual heating furnace data. The method is more effective and superior than SFA and ReliefF-SFA.
LIPING ZHANG received the bachelor's degree from the Nanjing Institute of Technology, Nanjing, China, in 2017, and the master's degree from Northeastern University, in 2020. His research interests include machine learning, process monitoring, and fault diagnosis.
YINGHUA YANG received the B.S. degree and the M.S. and Ph.D. degrees in control theory and control engineering from Northeastern University, Shenyang, China, in 1991, 1994, and 2002, respectively. He is currently an Associate Professor with the College of Information Science and Engineering, Northeastern University. His research interests include process monitoring, fault diagnosis, and modeling method based on data driven and its application on smart manufacturing. VOLUME 8, 2020