Functional Soft Sensor Based on Spectra Data for Predicting Multiple Quality Variables

In many complex chemical processes, such as ethanol fermentation process, prediction of the multivariate quality variables based on spectra data presents a great challenge because the dimensions of the spectra far exceed their sampling number. To address this problem, the dimension reduction of the predictors is necessary. It can be conducted either by regressing on components or by smoothing methods with basis functions. Based on the functional data analysis methods, this work introduces a novel wavelet functional partial least squares (WFPLS), which combines both of the foregoing dimension-reduction approaches. The high-dimensional spectra can be well fitted by fewer wavelet basis functions in the proposed method. Using the proposed WFPLS method does not require the measured data to be linear and to be sampled on a regular basis. It will be proved that the proposed WFPLS method can be finally transferred into the traditional PLS method in computation. By comparison with the existing prediction methods, the advantages of the proposed method are well demonstrated via a numerical case and an ethanol fermentation experiment.


I. INTRODUCTION
In practice, spectrum technologies, such as middle infrared and near-infrared (NIR), have increasing applications in many chemical plants because of a significant virtues of nondestructive and no sample preparation [1]- [4]. By comparison with the traditional sensors, spectrum technologies have the merits of no invasion and thus have little effects on the production process. Meanwhile, different from the traditional measured samples that contain discrete variables, the spectra have the nature of a distribution [5], [6]. The dimension of each spectrum is very high and much larger than the number of measured samples, which is a typical undersampling problem. Thus, it is meaningful and necessary to establish a more efficient and accurate soft sensor model with respect to spectra data.
In the recent years, data-driven modeling methods achieve great applications in many field, because they have few The associate editor coordinating the review of this manuscript and approving it for publication was Vicente Alarcon-Aquino . requirement on the prior knowledge, easy operation, great generalization performance [7]- [10]. To get better final product quality and improve the industrial benefits, it is meaningful and necessary to establish an accurate online prediction model for practical applications. In the past publications, the spectral models established by multiple linear regression, principal component regression, partial least-squares (PLS), and support vector regression (SVR) were studied and compared [11]- [14]. It was found that PLS is better than the other methods as the former has the minimum prediction error. With the prior process knowledge, some process variables were found to have a great effect on the final product quality, which were used for model building by combination with NIR spectroscopy [5], [15]. However, most existing calibration modeling methods are linear models. They may not perform well in the practical case with serious nonlinear characteristics. To handle the implied nonlinear problem, kernel based methods, such as kernel PLS and SVR, is proposed to improve the prediction performance [16], [17]. However, how to choose the proper kernel functions and tune the kernel VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ parameters is still an open issue to be solved, hindering the practical applications of the kernel based methods. Analytically, the measured spectra data have several characteristics as follows: 1) high dimension. There may be hundreds or thousands of variables in each measured spectra samples; 2) the limited number of data. The number of modeling samples is not adequate for establishing an accurate calibration model by compared to the variables; 3) nonlinear property. As most measured spectral data have a serious nonlinear relationship. Linear methods often fail in many practical cases. Based on the above analysis, the dimension of spectral data should be reduced before modeling. Meanwhile, the nonlinear relationship should be taken into consideration in order to obtain more accurate calibration performance [4], [15]. In most published calibration methods, the modeling data are analyzed discretely and discretely by the traditional multivariate statistical methods, without considering the implied continuous property within the variables. In the measured spectra data, the spectrum trajectories often appear as smooth curves. Therefore, functional data analysis (FDA) methods can be applied to spectra data analysis [18]- [21]. Explanatorily, FDA deals with the analysis and the theory of data that are in the form of functions [19], [20]. By comparison, FDA methods are suitable for the irregular nonstationary time series analysis and emphasize the smoothness of the variable trajectories [22], [23]. Functional regression and functional PLS models also have found applications in chemical processes [1], [24], [25], where B-spline functions were mostly applied as basis functions for computation. However, the above mentioned basis functions belong to global function, which may cause redundant computation in many situations. Assume that a curve is well approximated except only a partial local range. If another global basis function is added, the well fitted ranges will be degraded. So, more global basis functions must be added to achieve the required accuracy. To handle the above problem, the multiscale wavelet functions are increasing used for approximation and show great advantages [26]- [28]. Moreover, a novel active strategy was applied to calculate the wavelet basis functions recently. More concise model and better modeling performance can be obtained.
In this paper, a novel wavelet functional partial least squares (WFPLS) is proposed to build soft sensors using the measured spectroscopy. Wavelet functions are applied to fit the trajectories of each sampled spectrum. To get more concise model and better performance, an active strategy is used to calculate the number and locations of the basis functions. By setting a proper threshold, the spectral curves can be well approximated by the proposed method no matter how heavy the nonlinear curve is. Then the functional partial least squares model is performed on the determined basis functions. In this way, the parameters of the prediction model are obviously reduced, and the nonlinear problem can be easily solved. The main contents of this paper include the following sections. Firstly, the preliminary knowledge about the traditional PLS and the active strategy are introduced in Section 2. Then the proposed WFPLS method, as well as the online application and some discussions, are given in Section 3. For illustration, a numerical case and an experimental study on prediction the multiple quality variables of the ethanol fermentation process by using spectra are discussed in Section 4. Finally, Some conclusions are drawn in the end.

II. PRELIMINARIES
To enhance understanding, the traditional PLS and the active strategy to calculate the basis functions are firstly reviewed in the following.

A. THE TRADITIONAL PLS
Suppose the measured spectra are denoted as is the ith spectrum, N and M are the numbers of spectra and wavelengths in each spectrum, respectively. The quality data are denoted as Y ∈ N ×J , where J is the number of quality variables. The traditional PLS method performs regression on the collinear variables by projecting to a lower dimensional latent space [29]. The PLS model can be built as follows.
where w and v are the loading vectors with respect to X and Y, respectively. The above optimization problem can be solved by the Lagrange multiplier technique as follows where λ 1 and λ 2 are the eigen values, w and v are the corresponding eigen vectors, respectively. Then the latent vectors t and u can be calculated as The modelling data can be updated as is the regression coefficient between u and t. In this way, the components can be determined one by one iteratively. More details about the traditional PLS model can be found in [29]- [31].

B. ACTIVE STRATEGY TO DETERMINE WAVELET BASIS FUNCTIONS
In this work, DB4 wavelet is applied for illustration and application owing to its good properties of multi-scale, orthogonal and compact support. To get more delicate model and better modelling performance, an active learning strategy is recently proposed to determine the numbers and locations of wavelet basis functions [4], [20].
The key idea of the active strategy is to improve the fitting performance in the location corresponding to the maximum error in each step. For clear, an illustration figure is shown in Figure 1. The top curve e 2 represents the mean approximation error vector calculated by the selected basis functions.
wherex n,m is the fitting value of x n,m by the previously selected basis functions, the error vector e 2 = [e 2 m , . . . , e 2 M ] T . The location with respect to the maximum error is recognized and marked by the red point. Correspondingly, a wavelet basis function is firstly added and the error values around are updated as zeros. Similarly, the location with respect to the maximum error of the updated curveẽ 2 can be sequentially determined and marked by brown point. The second wavelet basis function can be added correspondingly and the error curve is sequentially updated. Repeat the above steps until the approximation error meet the predefined threshold. More details about the active strategy can be found in [4], [20], which is thus omitted here.

III. THE PROPOSED SOFT SENSOR METHOD
In this section, the proposed WFPLS is detailed and discussed. Sequentially, the online applications of the proposed WFPLS methods is detailed, and some comments are given for better understanding.

A. WFPLS METHOD
The functional extension of PLS prediction of quality y i is based on the functions s i (t) with t the function argument. To extend the idea of PLS, the loading vector w is replaced by a loading function w(t). Since the function s i (t) and w(t) are continuous, integral operation is thus involved. In this way, each spectrum can be approximated by a continuous function, and the matrix X becomes a function vector s(t) = [s 1 (t), . . . , s N (t)]. Then Eq.(1) can be expressed as follows.
Note that the other symbols are the same as defined before. For computation, the spectral function s i (t) can be described as a linear combination of the wavelet basis functions where ϕ(t) = [φ 1 (t), . .
Or in the matrix form where = {ϕ k (s j )} ∈ K ×M . Then the criterion is minimized by the solution, Thus, the function vector s(t) can be expressed by where C = [c 1 , . . . , c N ] T ∈ N ×K is the coefficient matrix. Similarly, the loading function w(t) can also be described as a linear combination of the basis functions.
where b = [b 1 , . . . , b K ] T is the coefficient vector. Considering the functional expressions of Eq.(10) and Eq.(11), the following equation is easily obtained

< s(t)w(t), Yv >= ( s(t)w(t)dt) T Yv
Owing to where J = ϕ(t)ϕ T (t)dt. The object in Eq.(12) becomes b T J T C T Yv. Similarly, the constraint VOLUME 8,2020 In this way, the WFPLS model becomes If orthogonal basis functions are used, J will be an identity matrix with the dimension of K × K . WFPLS in Eq.(15) can thus be equivalent to the traditional PLS, given by Kindly note that the expression in Eq. (16) is exactly the same as the traditional PLS by replacing the raw data X as the coefficients C. This conclusion is tenable only when the orthogonal basis functions involved. Otherwise, we cannot obtain the formula in Eq. (16). In this way, the traditional NIPALS algorithm can be applied to calculating the model parameters. Then the quality variables can be predicted bŷ whereŶ is the predicted quality matrix, p j is the jth loading vector, r i is the ith regression vector, I is the identity matrix with the proper dimension, A is the number of retained latent factors.

B. ONLINE APPLICAION
For the online application, when a new vector x new is available, it should be approximated by the wavelet basis functions firstly. Based on the trained basis functions, the corresponding function s new (t) is determined by where c new is the approximation vector. Then the prediction of the quality variable can be calculated bŷ whereŷ is the predicted quality vector, is defined in Eqs. (18) and (19). Thus, the quality variables can be predicted in real-time based on the easily measured NIR spectra data.
For better understanding, a brief flow chart of the off line training and online application of the proposed WFPLS is given in Figure 2.

C. COMMENTS AND DISCUSSION
As stated in the proposed method, the high-dimensional spectral data are approximated by wavelet basis functions and modeled in the functional space directly. The proposed WFPLS has the following implied advantages: (1) The high-dimensional data can be approximated by dozens of basis functions. In some sense, the information and properties of the raw data are inherited by the approximation coefficients. Therefore, the approximation step in the proposed method can be taken as a feature extraction or dimension reduction. The high-dimensional and nonlinear problems can be well handled by the proposed wavelet functional method. Then the quality can be accurately predicted based on the approximation coefficients. Without fewer assumptions on the modeling data, the proposed functional method is easy to use in practice.

IV. ILLUSTRATION EXAMPLES A. NUMERICAL CASE
The numerical data with continuous inputs are generated as follows s i (t) = α 1 P 1 (t |µ 1 , σ 1 )+α 2 P 2 (t |µ 2 , σ 2 )+α 3 P 3 (t |µ 3 , σ 3 ) (23) where t is the variable uniformly sampled in the range of [0, 5], P i (t |µ i , σ i ), i = 1, 2, 3, follow the Gaussian distributions, where µ i and σ i are the mean and the variance, and By defining a mapping function that reflects nonlinear output response as follows The output data is generated by . . , 80, j = 1, 2, . . . , 5. The overall simulated data are X ∈ 80×500 and Y ∈ 80×5 , where 40 samples for model training and the other 40 samples for testing. Note that the Gaussian noise N (0, 0.04) is involved in the numerical case.
In this work, the root mean square error for the prediction (RMSEP) and R 2 are used to demonstrate the prediction performance, where y i ,ŷ i , andȳ denote the i th quality value, its prediction and its mean value, respectively. Using the proposed WFPLS method, a total of 25 wavelet functions are determined for computation. So that the dimension of modeling data in WFPLS method can be obviously reduced compared to the raw data X ∈ 80×500 used in the traditional methods. Figure 3 demonstrates the approximation results of the numerical data, where the blue solid line is the simulated curve, and the red dot dash line is approximated by the proposed WFPLS method. The results well demonstrate the approximation effectiveness against the noise.
The proposed WFPLS, SVR and PLS are used to predict the quality variables for comparison in this case. For the application of SVR, the Gaussian kernel is used to handle the nonlinear problem of the simulated data. Here the parameters of each method are tuned by the traditional cross-validation methods. Not list all, the prediction results for the third quality variable Y 3 by the proposed WFPLS, SVR, and PLS are shown in Figure 4 (a), (b), and (c), respectively. The blue lines with circles are the true simulated values, and the red lines with stars are the values predicted by different methods. To make clear comparisons, the predictions for a local region are zoomed in a subplot in the original plot. As shown in Figure 4 (a), the predicted values by WFPLS exactly overlap the true values. The values predicted by the SVR and PLS methods have some deviations to the true values more or less (Figure 4 (b) and (c)). For better comparison, the RMSEP and R 2 values by the three methods are shown in Table 1 and 2. The prediction results indicate the proposed WFPLS is much better than SVR and PLS, with the minimum RMSEP values and the R 2 values roughly equal to 1.

B. ETHANOL FERMENTATION PROCESS
In the saccharomyces cerevisiae glycolysis process, the ethanol and carbon dioxide are produced by the reaction, with a small amount of energy released. In this process, the concentration of glucose, the concentration of ethanol and the biomass are very important parameters. This work is based on a real-time detection system of NIR technology as shown in Figure 5 (a), and its schematic diagram is given in Figure 5 (b). In the experiment, the capacity of the fermenter is 2.5 L, NaOH is used to control the pH values in the fermenter, which is measured by the pH meter, the reactor temperature is measured by the thermocouple PT100, the temperature is controlled to the predefined value by a heating device and cold water, and the electric stirring paddle is used to agitate the fermentation evenly. During the fermentation process, the NIR spectrum analyzer is used to measure the acquisition of the fermented liquid under different wavelength absorbance values in real-time. Then the measured spectra data are sent to the monitoring computer. The prediction model for the glucose concentration, ethanol concentration, and biomass can then be established based on the measured data.
Saccharomyces cerevisiae is used as the strain in this experiment. The seeds need to be cultured before the fermentation experiment is conducted. The Fourier NIR analyzer (TALYS-ASP531) produced by ABB company and its matching immersion diffuse reflection probe are used to measure the NIR spectra data. For modeling, the wavelength range is 4,790-12,000 cm −1 , the instrument resolution is 16 cm −1 , the number of spectral scans is 64, and the detector gain is 237.84. The samples which are centrifuged and diluted are measured every 0.5 h. Then liquid chromatography (water) is used to measure the glucose concentration, gas chromatography (Agilent 6890 Series GC System) is used to measure the ethanol concentration, and the microplate reader (Multiskan Ascent) is used to measure the biomass (OD) in the fermentation process. The measured spectral data are shown in Figure 6. A total of 6 batches of the ethanol fermentation process is conducted. Among them, 5 batches with 138 samples are used for training, and the other one batch with 26 samples is used for testing. Using the proposed function method, the raw high-dimensional data can be approximated by only dozens of wavelet basis functions. It is statically enough to train the model by 138 samples. The proposed WFPLS, SVR and PLS are used to predict the quality variables, including the glucose concentration, the ethanol concentration, and biomass, in the ethanol fermentation process. Similarly, the Gaussian kernel is used to handle the nonlinear problem within the spectra data. The parameters of each method are tuned by the traditional cross-validation methods. The prediction results using different methods for the ethanol concentration in the experiment are given in Figure 7. The blue lines with circles are the true measured values, and the red lines with stars are the predicted values. It can be found that the values predicted by the proposed WFPLS method are closer to the true values than those predicted by SVR and PLS methods.   Table 3 and 4. It can be seen that the prediction results for the glucose concentration and the ethanol concentration obtained by the proposed WFPLS are much better than those obtained by the SVR and PLS, with the minimum RMSEP values and the maximum R 2 values.

V. CONCLUSION
A novel WFPLS method is proposed for quality prediction based on high-dimensional spectral data in this manuscript. In the proposed method, wavelet basis functions are used to fit the spectra data. Thus, the high-dimensional spectral data are modeled as smooth curves and analyzed directly in the functional space. By proper deviations, the WFPLS modeling problem can be converted into the traditional PLS one because of the orthogonal property of wavelet functions. The online application and discussion are detailed in this work. The numerical case is used to demonstrate the effectiveness and advantages of the proposed WFPLS by comparison with the traditional PLS and SVR methods. The experimental ethanol fermentation process is used to testify the practical feasibility and effectiveness of the proposed method. For future study, the applications of WFPLS for various process conditions will be further researched.