Soft Sensor Modeling for Unobserved Multimode Nonlinear Processes Based on Modified Kernel Partial Least Squares With Latent Factor Clustering

To cope with the soft sensor modeling of unobserved multimode nonlinear processes, this paper proposes a modified kernel partial least squares (KPLS) by integrating latent factor clustering (LFC), called LFC-KPLS. In the proposed method, the process data are first divided into several batches orderly, and then projected onto the latent space by using the nonlinear functional expansion technology. In the latent space, partial least squares method is applied to compute the regression coefficients between the input variables and output variable of each batch. These regression coefficients, called the latent factors, can describe the functional relationships in the unobserved multimode data. Therefore, the latent factors are used for mode clustering so that the process data with similar functional relations can be clustered in one mode together. For each mode, the nonlinear soft sensor is established based on KPLS. To assign the mode of the online query sample, a mode identification strategy based on Bayesian inference is designed for the soft sensor online prediction. Finally, two cases studies are adopted to validate the proposed method.


I. INTRODUCTION
Realtime monitoring and control of quality variables play a vital role in the complicated industrial processes [1], [2]. However, some important quality variables, such as the freezing point of diesel oil in the refinery fractionators, and the concentration of reactant production in the chemical reactors, are often difficult to measure directly through the hardware sensors. Even if the online quality analyzers are installed in some units, they have the shortcomings of high price and frequent maintenance. In most cases, these quality variables are obtained only by the offline laboratory analyses. It has the disadvantage of a significant time interval (often from 4 to 12 hours) so that the realtime control on the quality variable is not practicable. Therefore, the soft sensor technique, which builds a virtual software sensor by mining the math-The associate editor coordinating the review of this manuscript and approving it for publication was Ming Luo . ematical relationship between the easy-to-measure process variables and the difficult-to-measure quality variables, has been extensively implemented in the industrial plants. The present soft sensor modeling methods can be divided into two categories: model-based and data-based. The former builds the soft sensor based on the accurate physical and chemical mechanisms, which are often difficult to obtain in many complicated processes. The latter performs the data mining in the historical running data without the use of accurate mechanical models, which is more popular in recent years because of the available abundant process data [3].
Some typical data-driven soft sensor modeling methods include partial least squares (PLS), Gaussian mixture regression (GMR) and extreme machine learning (ELM) [4]- [7], etc. Among these methods, PLS has gained great attention in the soft sensor field because of its effectiveness. Sharmin et al. [8], Zheng and Funatsu [9], Zheng and Song [10] discussed the successful applications of PLS in different industrial units. However, the basic PLS models are intrinsically linear while many real industrial processes are with strong nonlinearity. To deal with the nonlinear soft sensor modeling issue, a number of nonlinear extensions of PLS have been proposed. In the early studies, quadratic PLS [11] and neural network PLS [12] were developed, which utilize the quadratic polynomial and neural networks to model the inner nonlinear relation, respectively. Later, Bang et al. [13] applied the fuzzy inference system to assist the nonlinear PLS modeling. Considering that Gaussian process regression (GPR) has the powerful nonlinear fitting ability, Liu et al. [14] designed a GPR-PLS method and tested it on a real wasterwater treatment process (WWTP).
In recent years, kernel PLS (KPLS) has been developed as one effective nonlinear PLS method [15]. Different to other nonlinear PLS versions, KPLS avoids the explicit nonlinear optimization via the kernel trick. Because of the simpleness and effectiveness, KPLS has attracted enough attention in the nonlinear soft sensor field. Zhang et al. [16] applied KPLS to an industrial oil refinery fatory and demonstrated its performance advantage over the linear PLS. To deal with the batch process soft sensor modeling, Wang et al. [17] developed a new multiway KPLS method, which uses the feature vector selection to reduce the number of kernel vectors for low computation loads. The KPLS modeling is based on the sufficient training data. However, in some new industrial process, the training data is often very limited. To cope with this problem, Chu et al. [18] combined transfer learning idea with KPLS modeling for an improved joint-Y KPLS (JYKPLS) method, which transfers the rich information from similar old processes to the new process model. To deal with the collinear characteristic and enhance the model prediction performance, Tang et al. [19] built a selective ensemble KPLS (SENKPLS) method, where one double-layer genetic algorithm is employed to optimize the parameters of sub-models.
Apart from the process nonlinearity, the multimode operation is another common situation in industrial processes. Due to the market demand changes, the process disturbances, and the changeover of catalyst, etc., the process operation modes are often changing. In this case, the single global nonlinear soft sensor may not provide the best predictions. Therefore, how to design the multimode nonlinear soft sensor model is a valuable problem deserving deep discussions. The researchers have developed many solutions for this problem, which can be divided into two categories. One category develops the local models by appling the just-in-time learning (JITL) strategy, while another category builds mutiple models by the divide-and-rule (DAR) strategy. The JITL method is also called lazy learning method because it only collects the historical data as the training dataset and does not need the offline model training. When an online query sample is available, JITL constructs a local model by searching the most relevant samples in the offline training dataset. For dealing with the soft sensor modeling of multiphase batch process, Jin et al. [20] developed a JITL KPLS method, where a hybrid similarity including the sample similarity and phase similarity is used to select the relevant training samples and then the local KPLS soft sensor is built for each query sample. To consider both the modeling accuracy and the efficiency, Chen et al. [21] proposed a JITL method with selective updating based on approximated linearity dependence (ALD) and applied it to the soft sensor of roller kiln temperature. The DAR method firstly identifies the process modes by applying the data clustering technologies, and then builds multiple local soft sensors corresponding to the different clusters. At the online prediction procedure, the new sample is assigned to one certain mode based on some similarity index such as the distance similarity. The commonly used data clustering methods include the K-means method, and the fuzzy C means (FCM) method. Zhao et al. [22] proposed an improved K-means based ensemble KPLS method. Yuan et al. [23] utilized FCM to obtain different local clusters and built the locally weighted PCR model for the query sample. Gholami et al. [24] presented a soft sensor by combining the FCM clustering with the support vector regression. Wang et al. [25] designed a nonlinear multimode process soft sensor, which applies a self organizing framework to build the multimode KPLS and applies the conditional probability density analysis to identify the sample mode. To overview the present multimode soft sensor methods, both the JITL and the DAR strategies can handle the soft sensor modeling for many complicate processes including nonlinear and/or multimode processes effectively. However, JITL involves a larger computation loads because the local model is online built for each query. Therefore, this paper focuses on the DAR based soft sensor modeling method.
Although the present DAR based soft sensor methods have achieved the significant success in the nonlinear multimode processes. However, there are still some challenging problems worthy of extensive study. One important problem is the soft sensor modeling for the unobserved multimode nonlinear processes. Almost all the past works focus on the observed multimode processes, where there is often an underlying assumption that the different operating modes can be distinguished by investigating the magnitudes of the measured variables, that is the input variables of the soft sensor. However, the unobserved multimode process, firstly discussed by Liu [26], is a different kind of multimode process where the operating mode switching can not be directly measured. For example, in the refinery units, when the crude oil types or properties change, the process variables are kept at the similar operation points, but the product quality variables appears with multiple modes. Another example is about the reactor. In some chemical reactors, the catalyst activation energy degrades as time goes, which also brings the unobserved multimode data. In these cases, different process modes come with the similar measured variables, but the inner mechanism between predictors and quality-related variables has changed. For the unobserved multimode proceess, it is difficult to perform the mode division by the distance similarity based clustering method. VOLUME 8, 2020 According to the above discussions, we propose a new soft sensor for unobserved multimode nonlinear process based on a modified KPLS by integrating latent factor clustering, called LFC-KPLS. The contribution of the proposed method is three fold. First, a soft sensor modeling framework is designed for the unobserved multimode nonlinear processes. To our best knowledge, we are the first to discuss the soft sensor modeling method of the unobserved multimode nonlinear processes. Second, a latent factor clustering is designed based on the functional extension technique. Different to the traditional clustering methods, LFC clusters the multimode data by measuring the similarity of nonlinear data relationship, but not the similarity of sample distance. Third, a mode identification method for the online query sample is proposed by applying the Bayesian inference to compute the posterior probability.
The remainder of this paper is organized as follows. Section II overviews the preliminaries including the KPLS and the FCM. Then the proposed methodology is introduced in the Section III. Section IV gives two case studies of one numerical system and the simulated continuous stirred tank reactor. The last section offers some conclusions.

II. PRELIMINARIES A. KERNEL PARTIAL LEAST SQUARES
Kernel partial least squares (KPLS) combines the kernel technique with PLS for a nonlinear regression model [15]. For the given input matrix X ∈ R n×m and the output vector y ∈ R n with n samples, KPLS first projects the nonlinear original input data X into the linear latent space ψ(X) and then performs the linear PLS modeling between ψ(X) and y, which brings a PLS regression model as where ψ(.) is the assumed nonlinear transformation, b is the regression coefficient,ŷ = ψ(X)b is the output prediction value, while e is the prediction error vector. As the nonlinear mapping function ψ(.) is usually unknown and can not be explicitly expressed, Eq. (1) can not be directly used for the output prediction. To deal with this problem, we expand the regression coefficient vector by the input data matrix as Combining the Eqs. (1) and (2) leads to a nonlinear PLS model based on the kernel matrix, which is given as where K = ψ(X)ψ(X) T is the kernel matrix with its (i, j)-th element k ij defined by where x i , x j represent the i-th and j-th vector in the matrix X, respectively, and ker(·, ·) denotes kernel function computation. The commonly used kernel function is the Gaussian kernel function, expressed by [17] ker( where σ is the kernel width parameter.

Algorithm 1
The Solution Procedure of KPLS Model 1: Given the input matrix X, the output vector y, and the retained kernel score vector number L. 2: Randomly initialize u (usually, u can be set to the output variable y). 3: Compute the input score vector t = Ku and normalize it by t/||t||. 4: Obtain the weight coefficient c = y T t. 5: Calculate the output score vector u = yc. 6: Repeat the steps 3 to 6 until convergence. 7: Deflate the kernel matrix and the output vector as The solution of KPLS can be done by the classic NIPALS algorithm [15], [16], listed in Algorithm 1. Based on the KPLS algorithm, the regression coefficient vector β can be established by where T is the input score matrix and U is the output score matrix.
For the test input vector x t , its corresponding output prediction is given byŷ where k t = (ψ(X)ψ(x t )) T is the kernel vector corresponding to the test vector.

B. FUZZY C-MEANS CLUSTERING
Fuzzy C-means (FCM) is a well-known data clustering method and has been widely used for unsupervised data pattern recognition [23], [24]. It groups all the training data into C clusters with varying membership degrees. FCM is be viewed as the improvement of the traditional K-means clustering. Different to the K-means method where each data point only belongs to one cluster, FCM assigns each datapoint to all clusters with different membership degrees. It has been demonstrated that FCM outperforms the basic K-means method in many cases. Given the sample set X = {x 1 , x 2 , · · · , x n }, where x i ∈ R m is one sample, FCM is to find the cluster centroid o 1 , o 2 , · · · , o C based on the following optimization objective where µ ij is the membership degree of x i belonging to the cluster o j , and r is the fuzziness exponent usually set to be one real number greater than 1.
To solve this optimization function, an iteration procedure is applied, which is described as follows.

Algorithm 2
The Solution Procedure of FCM 1: Randomly initialize the cluster centers 3: Update the cluster centers by the equation 4: Repeat the steps 2 and 3 until convergence.
In this algorithm, the cluster number C is an important parameter which is needed to be pre-specified. In the cases with enough prior knowledge, it can be determined by experience. Without available knowledge, it can be set based on the data-driven methodology [27].

III. THE PROPOSED LFC-KPLS METHOD
As mentioned in the introduction section, the unobserved multimode processes data are distance indivisible. To develop a soft sensor method for the unobserved multimode process, three problems are involved. (1) How do we design a soft sensor modeling framework? (2) For the distanceindivisible data, how do we develop a cluster algorithm to recognize the different modes? (3) For the online query sample, how do we identify its mode? Aiming at these questions, we are to propose one latent factor clustering based KPLS (LFC-KPLS) method for unobserved multimode process soft sensing. Next, the details of the modeling framework, training data clustering and online query sample mode identification are introduced.

A. THE SOFT SENSOR MODELING FRAMEWORK
The whole schematic of the proposed LFC-KPLS method is displayed in the Fig. 1. During the offline modeling stage, the multimode KPLS model is built by the following steps. Firstly, the training data are divided into serval batches along the time orderly. Then, for each batch, the data are projected into the latent space by the nonlinear function expression, and the latent factors are computed to indicate the relationship between inputs and output. Thirdly, FCM is applied to the latent factors to obtain different modes, and for each mode, a KPLS model is developed as the local soft sensor submodel. In the online application stage, one online query sample is collected and its mode is identified by the Bayesian inference technology. Based on the identified mode, the corresponding KPLS model is chosen to generate the output prediction.

B. LATENT FACTOR CLUSTERING
For the unobserved multimode process, the process data from the different modes have different input-output data relationships, but may be very close in terms of the input sample distance. Therefore, the traditional FCM algorithm, depending on the distance similarity in Eq. (8), can not distinguish the data modes correctly, and it is necessary to develop a new data clustering method. The new clustering method should measure the data similarity based on the input-output data relationship, which means the regression coefficient b in Eq. (1). However, an assumed nonlinear mapping function ψ(.) is applied so that it is difficult to measure the similarity of the b directly. Thus, the key point focuses on the handling of nonlinear function ψ(.).
To deal with the above problem, this section proposes a new data clustering method called latent factor clustering(LFC). LFC first projects the original data onto a latent space by some explicit expanded nonlinear functions, which are used to substitute the implicit nonlinear function ψ(.). Then LFC computes the input-output relationship factor in the latent space, called latent factor. Based on the latent VOLUME 8, 2020 factors, the FCM is applied to cluster the different data modes. The details are clarified as follows.
For the training dataset X ∈ R n×m with multimode property, it is divided into serval batches X 1 , X 2 , · · · , X N with the same size by applying moving window technology. Each batch is denoted as X i ∈ R w×m , where w is the length of data window and meets n = Nw. X i can be expressed by where x i,jk represents the (j, k)-th sample in the data window X i . Before applying the nonlinear latent space transformation, the input and output vectors should be normalized for the same magnitude range by the following way.
where x i,jmin , x i,jmax is the minimum and maximal values of the j-th column input variable, respectively, and y i,min , y i,max is the minimum and maximal values of the output variable, respectively.
Then the corresponding latent space description is given by The output vector corresponding to the matrix X i is denoted asỹ i , which is the linear expression of the input matrix G(X i ) depicted bỹ To solve the above problem by the basic PLS algorithm will lead to the latent factor b i . Similar operations on all the data batches bring a series of latent factors b 1 , b 2 , · · · , b N . We further apply the FCM on these latent factors and the data clusters are obtained.
To sum up, the novelty of LFC lies in two aspects. (1) LFC clusters the data based on the input-output relationship factor, but not the original sample distance. (2) LFC provides a practicable nonlinear transformation by applying the explicit expanded nonlinear functions, which may not approximate the kernel function perfectly, but at least provides a viable solution to deal with the unknown ψ(.).

C. ONLINE MODE IDENTIFICATION BASED ON BAYESIAN INFERENCE
For the multimode soft sensor, one important question is to identify which mode the new query sample belongs to. In the distance clustering based multimode soft sensor, the assignment of new sample is determined by the spatial similarity. Usually, two ways are used. One way depends on the distance between the new sample and the cluster centers, which assigns the new sample to the cluster with the minimum distance. The other approach applies the K nearest neighbor method, which recognizes the cluster according to the K nearest samples. However, for unobserved multimode processes, both mode identification methods lose their feasibility.
To handle the above problem, this section proposes an online mode identification strategy based on Bayesian inference. Assuming that the clustering on the training dataset brings C modes {M 1 , M 2 , · · · , M C }, the occurrence probability of the mode M j regarding the query sample x i is obtained by where the p(x i ) is the occurrence probability of x i , which can be computed by where p(M j ) is the prior probability of the mode M j , while p(x i |M j ) is the conditional probability of the sample x i under the mode M j . The prior probability p(M j ) can be estimated by the training data or decided by the expert experience. The conditional probability p(x i |M j ) is designed as: where e (j) i represents the estimation error of the j-th soft sensor model on the sample x i . Theoretically, we compute this error based on the the sample x i 's estimated output f j (x i ) and the real output y i . However, in real applications, the real output y i is unknown at the i-th sample instant. Therefore, we apply the model estimation results at the (i − 1)-th time instant to substitute the above expression, which results in e (j) In fact, this applies an underlying assumption that the continuous two samples belong to the same mode. Considering the real industries often run under the same mode for a long period, this assumption is practicable.
Finally, the mode M (x i ) of the query sample x i can be determined as the mode with the maximum posterior probability p(M j |x i ), that means

D. SOFT SENSING PROCEDURE BASED ON LFC-KPLS
The proposed LFC-KPLS soft sensing procedure for unobserved multimode processes involves two stages: offline modeling and online application. During the offline modeling stage, the LFC-KPLS model is developed based on the training data, while at the online application stage, the query sample is collected and its corresponding output is given based on the developed soft sensor model. The details are listed as follows.
Offline Modeling Stage: • Gather the training dataset X, and standardize it with their mean and variance.
• Perform the latent factor clustering on the standardized training data to divide them into the C data modes Online Application Stage: • Collect the query sample x i at the i-th sample instant, and standardize it with the mean and variance of the training data.
• Identify the mode M (x i ) of x i using the Bayesian inference technology.
• Project the query sample onto the corresponding KPLS model and obtain the soft sensor output prediction.

IV. CASE STUDY
This section applies two case studies to validate the proposed multimode soft sensor method. One is the numerical example, while another is about the continuous stirred tank reactor (CSTR) system. The prediction performance of the proposed method is evaluated by the index of the root means squared error (RMSE). The better algorithm should be with the smaller RMSE.

A. A NUMERICAL SYSTEM
To test the proposed method, a numerical system is designed as follows [16]. Three nonlinearly-related input variables are expressed by where t is the random source variable with the uniform distribution in the range of [-1,1], e i (1 ≤ i ≤ 3) is the Gaussian noise with zero mean and the variance of 0.01. Based on the input variables, the output variables under three modes are computed as where e 4 is the output noise with the same characteristic to the input noise. For each mode, 500 samples are simulated as the training dataset, while the other 300 samples are generated to constitute the testing dataset. The input variables of the training data are plotted in Fig. 2a-c, where the first 500 samples (No.1-500) belong to mode 1, the middle 500 samples (No.501-1000) are from mode 2, while the last samples (No. 1001-1500) belong to mode 3. It is seen that the input variables of all modes follow the similar distribution. A three-dimensional plot of the input variables is plotted in the Fig. 2d, which indicates clearly that the input variables are distance indivisible. Furthermore, the output variable of the training data is given in the Fig. 2e. We see that there is no obvious distinction in view of the output variable. By analyzing the characteristics of the input and output variables, the numerical system is a typical unobserved multimode system. For this kind of system, it is a challenging problem to build the corresponding soft sensor.
To deal with the soft sensor modeling of unobserved multimode system, this paper proposes the improved KPLS method by incorporating LFC. We first validate the effectiveness of LFC. The training data are used to test whether the proposed method can identify three different modes correctly. For a comparison, the basic FCM clustering is also  used for mode identification. The mode identification results are plotted in Fig. 3 and summarized in Table 1. Fig. 3a lists the mode identification results of FCM, where many samples are misclassified with a low mode identification rate of 32.68%. When the LFC is applied, the training samples are divided into several batches. If the batch size is set to 15, 30, 50, respectively, the mode identification results are given in Figs. 3b to 3d, correspondingly. It is observed that if the batch size is set as 15, some samples from mode 1 is wrongly recognized as the mode 2. When the batch size w is chosen as 30, most of the samples are correctly identified besides some samples in the mode switch procedure. In this case, the mode identification rate is 97.33%. For a large batch size w = 50, all the samples are correctly identified with 100% mode identification rate. No matter what value is used, the LFC outperforms the basic FCM method. In the practice, the determination of the batch size is based on the user experience.
Next we analyze the prediction performance of the proposed method. For the method comparison, the basic KPLS method and the FCM-KPLS method are also applied to build the soft sensors. For all the used methods, the kernel width parameter σ and the kernel score vector number L are optimized by the intelligent difference evolutionary (DE) algorithm. The prediction charts of three methods are shown in Figs. 4, 5 and 6. TABLE 2 quantitatively compares the RMSE values of different soft sensors. By Fig. 4, the basic KPLS method can not predict the change of the output effectively, which has a large RMSE of 1.2038. When FCM-KPLS is used, as it is not able to distinguish the different modes, its prediction performance is also unsatisfactory. The RMSE of FCM-KPLS is even increased to 1.2088. That shows    unreasonable mode partition can worsen the soft sensor performance. With the proposed method, LFC can recognize the multiple modes correctly and the soft sensor can provide the remarkable performance improvement compared to the basic KPLS and FCM-KPLS method. The RMSE is reduced to 0.3173. To sum up, by projecting data into nonlinear latent space, the latent factor clustering based KPLS method can solve the unobserved multimode soft sensor modeling issue effectively.

B. THE CONTINUOUS STIRRED TANK REACTOR SYSTEM
The continuous stirred tank reactor system (CSTR) [29] is a well-known industrial process and its diagram is illustrated in Fig. 7. It has the characteristics of nonlinearity and multimode because of the complex chemical reaction mechanism and the process condition change. In CSTR system, the reactant A is transformed into the product B through a irreversible chemical reaction. The concentration of reactant A in the output is one key quality variable, which is chosen as the output y of the soft sensor. Eight auxiliary variables x 1 to x 8 are   Mechanical simulation is carried out to generate the system data, which involves two different operation modes. A total of 7200 samples are collected from the process simulator and one half is used as the training dataset while the other half is applied as the testing dataset. The training data set is firstly processed by the nonlinear latent space clustering to identify the different process modes. We set the batch size as 72 samples (2% of the whole training data). Therefore, 50 batches are obtained in the latent space. Three methods of KPLS, FCM-KPLS and LFC-KPLS are applied to the CSTR system soft sensor modeling. Then the testing data including 3600 samples are projected on these three models for performance comparison. Figs. 8 to 10 show the prediction results of three different soft sensors and table 4 quantitatively compares the prediction RMSE. By the Fig. 8, it is observed that the prediction output of KPLS has a clear bias with the real output. The corresponding RMSE is 0.5926 × 10 −2 . When FCM-KPLS is applies, it can not identify the correct data mode and therefore has a close performance with the basic KPLS method. As LFC can recognize the data modes effectively, the proposed LFC-KPLS method reduces the RMSE to 0.2407 × 10 −2 . The Fig. 10 demonstrates that the   output predictions are very close to the real values. Generally, the applications on the CSTR system show that the proposed method can build the more precise soft sensor in the case of unobserved multimode data.

V. CONCLUSION
In this paper, a novel soft sensor modeling method called LFC-KPLS is developed for the unobserved multimode nonlinear processes. The proposed method designs a modeling framework for the unobserved multimode framework, which can be generalized to many other similar data-driven soft sensor modeling method. Besides the design of the modeling framework, the other two important aspects in the proposed method are the offline mode clustering for the unobserved multimode data, and the online mode identification for the query sample. Two case studies, including one numerical system and the continuous stirred tank reactor system (CSTR), are applied to test the proposed method. The application results demonstrate that the LFC can identify the data modes more effectively than the basic FCM method, and the proposed soft sensor has a higher prediction precision compared to the traditional KPLS and FCM-KPLS methods. However, some limitations of the proposed method should be also noted. As the moving window technique is applied to divide the data batches, this method is based on the underlying assumption that the process mode could last a period of time and does not change suddenly, and the offline historical data are enough plentiful for model training. In the case of limited training data, some new methods should be investigated in the future work.