A Novel Monitoring Strategy Combining the Advantages of NPE and GMM

Semiconductor manufacturing process data usually have multimodel and multiphase characteristics which do not meet the application assumptions in neighborhood preserving embedding (NPE). Aiming at the above limitations of NPE, a novel monitoring strategy combining the advantages of the neighborhood preserving embedding and Gaussian mixture model(NPE-GMM) is proposed. Firstly, the window data are obtained by the default window width. Next, the score of the current window data set are calculated by NPE. After that, some Gaussian components of the score are determined by GMM. Finally, a quantification index is proposed to monitor process status. NPE-GMM can not only maintain more local structure information of the current window data set in the feature subspace, but also reduce the computational complexity of GMM in fault detection processes. By introducing the new statistic, NPE-GMM can effectively improve the fault detection rate of some multimodel batch processes. The effectiveness of the proposed method is verified in a numerical case and the semiconductor etching process. The simulation results indicated that the proposed method has a higher fault detection rate than traditional methods.


I. INTRODUCTION
With the rapid development of science and technology, it has been recognized that the industrial production process is developing towards automation and integration. As an important role, batch processes have been widely used in semiconductor, chemical, fermentation, pharmaceutical and other fields [1], [2]. In order to effectively monitor the status of batch processes, some data-driven methods have been proposed and successfully applied [3]- [5].
As a classical multivariate statistical process control(MSPC) technique, principal component analysis (PCA) has been widely used for dimensionality reduction and process monitoring [6]. Normally, PCA divides the raw space into principal component subspace(PCS) and residual subspace(RS). However, in PCS, the global Euclidean structure of process data is usually preserved, and the local structure information between data points are ignored [7], [8].
The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Tedesco .
A method based on local structure, neighborhood preserving embedding(NPE) is a well-applied linear projection technique [9], [10]. In NPE, the raw space is divided into two subspaces by a linear transformation, denoted as feature subspace(FS) and residual subspace(RS). Generally, two statistics, Hotelling's T 2 and squared prediction error(SPE), are used to monitor the status of processes in these subspaces, respectively. To effectively monitor a process using T 2 and SPE, it is essential that assumed that the process data follow a multivariate Gaussian distribution approximately. However, the unique characteristics of batch processes, such as nonlinearity in most batch processes, multimodel batch trajectories due to product mix, and process steps with variable durations and multiphase, cause that the process data do not follow a Gaussian distribution [11]- [13]. Hence, the conventional mornitoring methods based on PCA and NPE may result in bias on fault detection [14]. In view of the multiway characteristic of batch process data, some special techniques are developed for monitoring purposes [15]. For instance, multiway PCA(MPCA) proposed by Nomikos et al. is utilized in a semibatch reactor for the production of styrene-butadiene latex [16], multiway kernel PCA(MKPCA) proposed by Lee et al. is applied in a simulation benchmark of fed-batch penicillin production [17]. It should be noted that these techniques are only suitable for monitoring the status of a single-model process. Their fault detection capability is degraded when the status of a multimodel process is monitored. Considering above the issues, Gaussian mixture model(GMM) is proposed and extensively employed in batch processes [18], [19]. Nevertheless, GMM may not efficiently capture the local features for some complex processes [20].
Above all, a novel monitoring strategy combining the advantages of the neighborhood preserving embedding and Gaussian mixture model(NPE-GMM) is proposed. The paper is organized as follows. In section 2, the conventional NPE, MNPE and GMM are briefly introduced. The proposed monitoring strategy is introduced in section 3. The result of a numerical case is presented in section 4. Section 5 illustrates the performance of NPE-GMM based monitoring method through a simulation of the batch semiconductor etching process. Section 6 gives conclusions of this paper.
In NPE, we can determine weight matrix W by minimizing reconstruction error as follows: where w ij is a weight parameter between x(i) and x(j), and if x(j) is not the first k neighbors of x(i), w ij = 0.
Generally, we suppose that y=Xa. The vector a is computed through minimizing the following cost function under appropriate constraints [10]. The cost function is as follows: Let M=(I-W) T (I-W), the optimization problem of Eq.(2) can be transformed into a generalized eigenvector problem as follows: The eigenvalues obtained from Eq.(3) are arranged in ascending order. The eigenvectors corresponding to the first d eigenvalues constitute the projection matrix A = (a 1 , a 2 , a 3 , · · ·, a d ).
For batch process data X I ×J ×K (I , J and K denote the number of batches, time steps and process variables  at each time instance, respectively). The basic idea of MNPE is that the conventional NPE is implemented in a two-way matrix unfolded from a raw three-way matrix. In this subsection, the array-data X I ×J ×K is unfolded into a large matrix X I ×JK , which is shown in Fig.1. For the jth time slice matrix X I ×jK (j = 1, 2, · · ·, J ) , its score T I ×jK as follows: where A j is the projection matrix of conventional NPE, the MNPE modeling is shown in Fig.2.
For monitoring the status of batch process, the T 2 is constructed as the measure of variation in the FS [21]. It is defined as follows: where x i,j represents the jth time in the ith batch of X, A j is calculated by Eq.(3) and j is the covariance diagonal matrix of matrix X I×jK .

B. GAUSSIAN MIXTURE MODEL
. GMM assumes that the data consist of C Gaussian components with different means and variances. The conditional probability p x (i) C j (i = 1, 2, · · ·, m; j = 1, 2, · · ·, C) that the sample x(i) belongs to the jth Gaussian component is calculated by Eq. (6).
where µ (j) and j are mean column vector and covariance matrix of the jth Gaussian component, respectively, we can calculate the probability that the sample belongs to the jth Gaussian component as follows: where P j represents the prior probability of the jth Gaussian component. µ j , j and p x (i) C j can be iteratively obtained by the EM algorithm. E-step: Calculate the label c of the x (i) using Eq.(12) where c represents the index of the jth Gaussian component in which sample x (i) is located.

III. A NOVEL MONITORING STRATEGY COMBINING THE ADVANTAGES OF THE NEIGHBORHOOD PRESERVING EMBEDDING AND GAUSSIAN MIXTURE MODEL A. NPE-GMM
Considering the status of samples in multimodel batch processes, a novel monitoring strategy combining the advantages of the neighborhood preserving embedding and Gaussian mixture model(NPE-GMM) is proposed.
In this subsection, NPE-GMM strategy is detailedly introduced. Firstly, a window data are obtained by the default window width. According to the default window width and the moving step, a new window data set is updated through including the newest data and excluding the oldest one [22]. Moving window technique is shown in Fig.3.
Secondly, it is well known that data preprocessing is an important step for making the variables of data X (k) in the same order of magnitude. The data is preprocessed by the following Eq. (13).
where M (k) and (k) are mean column vector and standard deviation diagonal matrix of X (k), respectively, and 1 I is a vector of I × 1 dimension. Next, NPE is used to reduce the dimension of the data by Eq. (14), because the moving window technique increases the number of variables dramatically, where A k is calculated by Eq. (3). Y (k) remains most of the local structure information ofX (k). Then, in Y (k), C is obtained by cross-validation method [23]. Meanwhile, the cth Gaussian component parameters, such as the mean vector µ c and the diagonal matrix of variance c , are determined by Eqs. (7)(8)(9)(10).
where the Y i (k) belongs to cth Gaussian component, Y j i (k) is the jth neighbor of Y i (k) in the Gaussian component. And then Eq.(15) is rearranged as follows: Finally, the control limit MD 2 α (k) is determined according to the kernel density estimation method(KDE) [24]. The confidence level α in this paper are all selected as 99%.
The detailed fault detection procedure based on NPE-GMM method consists of two parts: offline modeling and online detection.
Step1: Offline modeling For the training sample X I ×J ×K , the fault detection part consists of seven steps. 1). Obtain the X I ×JK by unfolding X.
2). Determine the width and moving step of the moving window.
3). Preprocess of the data X (k) by Eq.(13). 4). Calculate the score Y (k) by Eq. (14). 5). Calculate the parameters of Gaussian components, such as C, µ c , c . 6). Calculate the statistic MD 2 i (k) by Eq.(16). 7). Determine the control limit MD 2 α (k) for fault detection. Step2: Online detection For the test samplex J ×K , the fault detection part consists of six steps. The parameters used in the online detection phase are the same as those in the offline modeling phase. 1). Obtain thex 1×JK by unfoldingx J ×K .
2). Preprocess of the datax J ×K as follows.
6). Compare MD 2 (k) with the control limits MD 2 α (k). If MD 2 (k) < MD 2 α (k),x 1×JK is classified as a normal sample. Otherwise, it is detected as a fault sample.

B. NOTICE
(1). When parameters of the moving window are determined, the moving step is usually chosen as process variables at each time instance k, and the window width w is usually set as 3k to 5k, because the w is too short to effectively obtain the sample status between adjacent moments, and it is too long to make them responsive to process fault.
(2). The dimension of FS is related to moving window width and the dimension of data increases with width.
(3). The parameter kk in MD 2 should be smaller than the parameter k in NPE, because NPE keeps the local structure of the sample and its k nearest neighbors.
(4). The fault detection rate will be different every time GMM is applied, because the results of each iteration of GMM will be slightly different. In order to solve the problem that the fault detection rate is inconsistent in each experiment, the average of the fault detection rate of several experiments in this paper is taken as the final fault detection rate.

C. ANALYSIS
In order to reduce the computational complexity, it is necessary to reduce data dimension. Meanwhile, some characteristics of batch process data, such as multimodel and multiphase, need to be preserved when dimensionality is reduced. As a linear dimensionality reduction algorithm, NPE aims to preserve the local neighborhood structure of data set. Hence, some characteristics of industrial data can be retained in the FS of NPE.
The advantage of GMM is that GMM could monitor process changes without knowing the abnormal pattern [25]. But, in cases for which the data dimension is large, the data are sparse occasionally, so it is difficult to determine GMM model [26]. It is noted that NPE could alleviate these difficulties. Therefore, NPE-GMM is proposed in this paper, which combines the advantages that NPE can contain the local structure of the data in FS and GMM can effectively classify data.
At the same time, NPE-GMM adopts Mahalanobis distance as the statistic which can eliminate the multimodel characteristic of the statistical value. Therefore, compared with some traditional methods, NPE-GMM method has higher fault detection rate in industrial processes with multimodel characteristics.

IV. CASE STUDY
In this section, we use an improved dynamic process based on Ref [27] to test the performance of NPE-GMM method in fault detection. The model is given by  where u is the correlated input where u(0)= w(0)=[0 0] T . The normal data have 100 batches and each batch has 1020 sampling instants. The last 1000 samples of each batch are used for analysis because the first 20 samples change greatly. The data vector for modeling consists of y T (k) y T (k − 1) u T (k) u T (k − 1) , because input u and output y are measured but x and w are not. v(k) is the random noise with zero mean and variance 0.1. These phases are determined by the input w whose parameter settings are shown in Tab.1 Faults are generated by adding small change in the mean of w in model 1 after the 501th sample. The 4 normal batches data are selected as the calibration data randomly, and the test batch data are composed of the calibration batch and the fault batch. The plots of monitored variables are shown in Fig.4, where the blue part represents the normal data, while the red line represents the fault data. It can be seen that faults occurred at the 501th time. In this section, we apply MPCA, MNPE and NPE-GMM in this case and then compare their performance. The parameter settings of different methods are listed in Tab.2.
Fault detection results of the calibration data set in MPCA are shown in Fig.5, where there is a significant difference in the statistical values between model 1 and 2. The value of the control limit is completely determined by model 2. At the same time, it can be seen that the statistical values also differ between different stages of the same model, so their control limit centers are different. Its fault detection rate using MPCA are shown in Fig.6. Meanwhile, fault detection rate is 0, because process data with multimodel characteristics do not follow Gaussian distribution. It can be seen that the fault detection rate of MNPE is 0. The main reason that MNPE has the poor performance is the same as MPCA.
The results of GMM classification in FS are shown in Fig.8. It can be found that the data are divided into two models which have different means and covariances. Fault   detection results of the calibration data set in NPE-GMM are shown in Fig.9. It can be found that the statistical values and control limits are basically at the same level, which indicates VOLUME 8, 2020  that NPE-GMM method eliminates the differences in means and covariances between two models. Fig.10 gives the results of NPE-GMM online fault detection. It can be found that NPE-GMM can timely detects faults. Fig.11 shows the fault detection chromatogram of the test batch, where the black part indicates the fault. We can see that NPE-GMM has a higher fault detection rate and a lower fault alarm rate than other methods. Above all, compared with other methods, we can find that NPE-GMM has the best performance in fault detection.

V. SEMICONDUCTOR MANUFACTURING PROCESS
With the rapid development of high-tech industry, semiconductors have been paid more and more attention as an important part of technology products. Therefore, online monitoring of semiconductor production processes has become a research hotspot. In this section, the data set is derived from a semiconductor etching process at Texas instruments [11], [28]. The multiphase is the most important characteristic of this process, which brings some challenges for   fault detection [4]. The semiconductor data set is generated by three experiments, so data have different means and covariances [10]. The process produced 108 normal wafers and   21 fault wafers. Due to lack of data in some wafers, this paper uses 107 normal wafers and 20 fault wafers for modeling and testing [28]. According to the Ref [29], a total of 17 variable which are shown in Tab.3 are used to monitor the process.
In the data set, different batches have different durations. Therefore, 6-85 sampling instants of semiconductor data are selected for modeling. Test data consist of 3 normal batch data and 20 fault batch data. In this section, a total of 104  training batches and 23 test batches are used for modeling and monitoring. By cross-validation [30], the parameter settings as follows: (1).in MPCA, PCs=5; (2) in MNPE, PCs=9, k = 9; (3) in NPE-GMM, w = 3, PCs=20, k = 9, kk=5. Fig.12 shows the results of GMM classification of the training data in FS. It can be seen that the data are divided into three models with different means and covariances. Therefore, traditional MPCA and MNPE show poor performance in fault detection. Fault detection results of the calibration data set in NPE-GMM are shown in Fig.13, it can be found that the most of normal test samples are below the control limit, which indicates that NPE-GMM model has better stability. In order to illustrate the effectiveness of the proposed method, traditional MPCA and MNPE are tested. Tab.4 gives fault detection results of different methods, it can be found that NPE-GMM has the higher detection rate than others for most of faults. At the same time, the highest fault detection rate of each test batch is marked in bold.  using NPE-GMM, respectively. From the Ref [31], we know that fault 10 and 19 belong to experiment 30 and 31, respectively. Above all, it can be found that NPE-GMM can effectively identify the faults generated in different models.

VI. CONCLUSION
In view of the problem that the fault detection in multimodel batch processes exists limitation, a novel monitoring strategy combining the advantages of the neighborhood preserving embedding and Gaussian mixture model is proposed. NPE-GMM can not only effectively eliminate multimodel characteristic of the data through GMM, but also significantly improve fault detection rate by the proposed statistic. The effectiveness of NPE-GMM is verified by a numerical case and the semiconductor batch processes.
CHENG ZHANG received the B.S. degree in mathematics and applied mathematics from Northeast Forestry University, Heilongjiang, China, in 2003, and the M.S. degree in mathematics and applied mathematics from the Shenyang University of Chemical Technology, Liaoning, China, in 2010. He is currently pursuing the Ph.D. degree with the Department of Control Science and Engineering, Northeastern University, Liaoning.
He is also an Associate Professor with the Shenyang University of Chemical Technology. His research interests include process modeling, monitoring, fault detection, and classification of batch processes.
XUNIAN DAI received the B.S. degree in information and computing science from the Shenyang University of Chemical Technology, Liaoning, China, in 2018, where he is currently pursuing the M.S. degree with the Department of Control Theory and Control Engineering.
His research interests include process data analysis, modeling, and fault diagnosis of industrial processes.
XIAOFANG ZHENG received the B.S. degree in information and computing science from the Shenyang University of Chemical Technology, Liaoning, China, in 2018, where she is currently pursuing the M.S. degree with the Department of Control Theory and Control Engineering.
Her research interests include fault detection and diagnosis of industrial processes. Her research interests include system identification, fault detection and classification, and data-driven complex process fault diagnosis. VOLUME 8, 2020