Kantorovich Distance Based Fault Detection Scheme for Non-Linear Processes

Fault detection is necessary for safe operation in modern process plants. The kernel principal component analysis (KPCA) technique has been widely utilized for monitoring non-linear processes because it enhances dimension reduction and fault detection in non-linear space. In this paper, an improved non-linear fault detection strategy based on Kantorovich distance (KD) and kernel principal component analysis is proposed. The KD statistic is based on the optimal mass transport theory where the distance between two distributions is computed with respect to a cost function. The addressed fault detection problem models the data using the KPCA framework and utilizes the ability of the KD statistical indicator to detect faults. The detection stage involves comparing the residuals of training fault-free data and testing faulty data using the KD statistic. Additionally, the reference threshold for the KD statistic is computed using the kernel density estimation (KDE) approach as compared to the previously utilized three-sigma rule approach. The detection performance is illustrated with the help of three benchmark case studies: a continuous stirred tank reactor (CSTR) process, Tennessee Eastman (TE) process and an experimental distillation column process. The performance analysis suggests the superiority of the KPCA-KD fault detection scheme in monitoring various sensor faults. Moreover, comparison with traditional statistical indicators of PCA and KPCA schemes shows that the proposed scheme enhances fault detection and achieves an improved detection rate in monitoring different categories of faults.


I. INTRODUCTION
In modern process plants, the important requirement is to ensure process safety as well as consistent product quality and hence, a good fault detection (FD) scheme is required [1]. Advancements in the field of process automation have revolutionized the chemical engineering industry to a large extent with efficient conversion of raw materials such as oil, natural gas and minerals to final products of very good quality. The emergence of smart sensor networks and distributed control systems in chemical industries for catering to continuous needs has complicated the dynamics of chemical industries [2]. Owing to these added complexities, continuous hazards such as emission discharge and explosions occur regularly in process plants. Such hazardous activities pose serious threats to human health and are also the main cause of environmental pollution. Some of the main causes for such The associate editor coordinating the review of this manuscript and approving it for publication was Mehrdad Saif . accidents are human error, poor maintenance of the plant and malfunctioning sensors and actuators in the process. Such mishaps can be kept in check if the process plants are continuously monitored. Fortunately, very good progress has been made in monitoring of automated processes in the last few decades by utilizing efficient fault detection schemes [3], [4].
The fault detection schemes can be divided broadly into two categories: model-based schemes and model-free schemes. The model-based FD scheme requires a precise mathematical model for its functioning and this is exigent vowing to the complex nature of chemical plants [5], [6]. The model-free FD schemes are further classified into knowledge-based and historical data-based schemes. The knowledge based schemes require prior expert knowledge of the system and have been frequently applied in recent years for fault detection problems [7]- [9]. In contrast, historical data-based schemes do not require prior expert knowledge and only large historical data are sufficient for monitoring [10]. The multi-variate statistical process VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ monitoring (MSPM) schemes belong to the family of historical data-driven FD schemes that emphasize monitoring multiple variables simultaneously [11]. Some commonly used MSPM methods include principal component analysis (PCA) [12], [13], partial least squares (PLS) [14], [15], canonical correlation analysis (CCA) [16], slow feature analysis (SFA) [17] and independent component analysis (ICA) [18], [19]. PCA is a linear orthogonal projection strategy that transforms the observations from larger dimensional space onto a smaller dimensional linear subspace by maximizing the variance of projections. The PCA FD strategy has been established for applications in a variety of FD problems in the last few decades. Several extensions of PCA have been proposed in the literature including dynamic PCA [20], multi-scale PCA [21], multi-block PCA [22] and recursive PCA [23]. Though very popular in the literature, the PCA-based methods adhere to the linear approach which applies only to data sets that are linearly separable and do not consider non-linear variations in process data [24]. To overcome the linear limitation of PCA-based methods, a non-linear scheme based on kernel PCA (KPCA) has been utilized in practice because it has the ability to effectively capture non-linear features in process data by using kernel functions [25], [26]. The kernel PCA computes the principal components in a high-dimensional feature space by using non-linear mapping. In the last few years, considerable work has been carried out to have different extensions of conventional KPCA scheme [27]- [30]. In a recent work proposed by [31] an automatic artificial neural network-based approach with augmented hidden layer was proposed along with a correlation analysis-based approach to tackle the data-distribution mismatch issue. Results was found to improve the computational time and enhanced the monitoring of abnormalities.
The T 2 and squared prediction error (SPE) statistical indicators have been commonly integrated with PCA and KPCA-based modeling techniques. While the T 2 indicator captures variations in the latent subspace, the SPE indicator captures variations in the residual subspace of the model. The T 2 statistic is defined for samples that follow gaussian distribution and it has good ability to detect additive faults [32]. However, it fails in few cases since the latent subspace is sensitive to small changes in the process [33]. In contrast, the SPE statistics seems to be superior as compared to the T 2 statistics for faults other than the additive faults as observed in detection of traction systems in high-speed trains [34]. Also, the SPE statistics is sensitive to the presence of fault at each sampling instant, and this enables it to have improved detection of faults in comparison with the T 2 statistics. There is a good scope for the use of alternative statistical indicators to strengthen the fault detection task. In recent years, Kullback-Divergence (KLD) [32] and Hellinger Distance (HD) [35] indicators that are used to find distance between two probability distribution have been proposed to strengthen the task of fault detection. The performance of KLD and HD based fault indicators were found to be better than conventional T 2 and SPE based fault indicators [36], [37]. In recent times, the Kantorovich distance (KD)-based indicator has been utilized in fault detection problems [38].
The KD metric indicates the minimum cost to shift a mass of data from source to destination distribution. The KD metric has been integrated with PCA modeling framework and the following advantages were observed: KD metric provided a smooth transition of faults, it detected the faults of small magnitude and provided good monitoring for data corrupted with noise [39]. The KD metric has been utilized for change point detection problem where it yielded good results with a minimum detection delay [40]. Later, the KD metric has been integrated with ICA modeling framework that enhanced monitoring results in comparison with the conventional indicators of PCA and ICA based strategies [41]. Though KD metric is focused on finding the distance between two distributions similar to KLD and HD based metrics, KD metric involves segment by segment comparison between the two distributions in a moving window of fixed length. This enables the KD metric to capture sensitive details in process data which enhances the detection of small magnitude faults. Since the KD statistic has demonstrated improved monitoring results, a fault detection scheme with KPCA as modeling framework and KD metric as statistical index is proposed in this study. The KD statistic is computed between the KPCA residuals generated from normally operating data and the residuals of new online data. In earlier studies on KD-based fault detection, the threshold for the KD statistic was computed using the three sigma rule. However, the three sigma rule may not be efficient way to compute the threshold. Hence, the kernel density approach (KDE) is utilized to compute the threshold in this study.
The reminder of the paper is organized as follows: In Section 2, an introduction to the PCA modeling scheme, the KPCA modeling scheme, along with two statistical indicators T 2 and SPE is provided. In Section 3, an overview of the optimal mass transport, KD statistic formulation, and threshold computation using KDE is discussed. Next, the proposed fault detection strategy that integrates the KPCA modeling technique and KD statistic, is presented in Section 4. In Section 5, the performance of the proposed KPCA-KD FD scheme is studied using three case studies: the CSTR process, the Tennessee Eastman process, and the experimental distillation column process. Comparisons were made between the PCA-T 2 , PCA-SPE, PCA-KD, KPCA-T 2 and KPCA-SPE based FD schemes with the proposed KPCA-KD based FD scheme. Finally, the conclusions are presented in Section 6.

A. PRINCIPAL COMPONENT ANALYSIS
Principal component analysis is a dimensionality reduction technique that transforms variables from a large multi-dimensional space to a lower dimensional space through latent variables or principal components (PCs). For raw data, X ∈ R n×d having n observations and d variables, After normalization, singular value decomposition (SVD) is performed on X sc to have [42]: are the scores and loading vectors, associated with covariance of X sc : After computing the optimum PCs, the PCA model is represented as a sum of approximated matrixX sc and a residual matrix F: The approximated and residual matrices are evaluated through T 2 and SPE based fault indices. For a new data X new , the T 2 fault indicator is described as [43]: For a new data X new , the SPE fault indicator is described as [43]: When the value of T 2 and SPE fault indicators exceed the threshold limits T 2 α and Q α respectively, a fault is declared.

B. KERNEL PRINCIPAL COMPONENT ANALYSIS
The conventional PCA is a linear approach and applies only to the data sets that are linearly separable. If the data is non-linear and cannot be expressed in linear space, then the conventional PCA method fails to capture the non-linearities which eventually affects monitoring performance. The kernal PCA approach aims to project the dataset from a lower dimension to a higher dimension feature space where the data set is linearly separable. Consider a data X ∈ R n×d having n observations and d variables, X = [x 1 T , . . . ., x n T ] with x i ∈ d . The data are transferred to linear feature space F d and covariance matrix is computed as: The (.) is a mapping function. The kernel PCs (KPCs) are calculated by solving the eigenvalue problem λv = F v: where λ, v and a, b denotes eigen values, eigen vectors and respective dot product between a and b. The λv = F v is equal to: The coefficient of each sample is a i (i = 1, 2, . . . . . . n) such that: Combining equations (7) and (8) yields the following: For (k = 1, 2, . . . . . . n) and by introducing the kernel matrix K, the equation (9) is reduced to [28], [29]: where α = [a 1 , a 2 , . . . . a d ] T . Next, the Kernel PCs are computed using the following expression for new data: Once KPCA model is developed, it can be used to compute faults in new data using T 2 and SPE statistical indicators. The variations in model is computed as follows [25]: . The residual subspace of the KPCA model is monitored using the SPE statistical index and it is computed as follows: where, (p(x)) indicates p PCs in higher dimensional feature space. When the value of T 2 and SPE fault indicators exceed the threshold limits, a fault is declared [26].

III. KANTOROVICH DISTANCE
The Kantorovich distance is derived from the concept of the optimal mass transport theory. According to this theory, information can be efficiently transferred from source to destination distribution with reference to a cost function [44]. Because of the ability to compare signals and data from different distributions precisely, methods based on optimal mass transport theory are gaining prominence in various engineering domains. For any two distributions Q and R with supports X and Y, the optimal mass transport proposed by Monge is presented as: where f(x) is mapping function and m(.,.) is a cost function and MP is measure-preserving mappings which is defined as: In the above formulation proposed by Monge, a mass could be relocated from one point in the source distribution to one point in the destination distribution. However, this formulation was improved by Kantorovich where multiple points from source distribution can be mapped with multiple points VOLUME 10, 2022 Algorithm 1 Segmentation Process for KD Computation 1) The observations of Q are divided into l segments Q 1 , Q 2 . . . Q l and each segment is made of j data points, that is, 2) The observations of R are divided into l segments R 1 , R 2 . . . R l and each segment is made of j data points, that is, 3) Each segment of Q is compared against each segment of R which implies that segment of source distribution R 1 is compared with all segments R 1 , R 2 and R 3 until R l of the destination distribution. The first sample of Q 1 (q 11 ) is compared with the first sample of R 1 (r 11 ), R 2 (r 21 ) . . R l (r l1 ). The second sample of Q 1 (q 12 ) is compared with the second sample of R 1 (r 12 ), R 2 (r 22 ). . R l (r l2 ) and so on until all segments are covered. 4) The next segment Q 2 is compared with all segments R 1 , R 2 and R 3 until R l of distribution R. This continues until all the segments of Q undergo comparison with all the segments of R. 5) The distances are recorded between the two distributions, and evaluated using the KD statistic. If Q and R are similar, the KD statistic is small, and if they are dissimilar, the KD statistic is large.
in destination distribution and this method was named as Kantorovich Distance. This improved one-to-many mapping ultimately provided a more feasible solution than the earlier proposed one-to-one mapping [45]. For two distributions Q and R, the minimum shifting distance required for data from Q to R is termed the Kantorovich distance. The cost of transportation will be very low or close to zero for two similar distributions and large for dissimilar distributions, thus, indicating the degree of dissimilarity between the two [40]. Process monitoring problems measure the dissimilarity while comparing non-faulty data and faulty data, and hence, the KD indicator has scope for application in the detection of faults. The optimal mass transport with respect to cost function l for case l = 1 is termed as the Earth mover's distance and is expressed mathematically as follows: (Q, R) represents the set of joint distributions and γ denotes the minimum optimal coupling [46]. For l = 2, the optimal mass transport is termed as Kantorovich Distance and expressed mathematically in the following manner: where µ q and µ r represent the means while q and r represents the covariance matrix of two random variables q and r. The computation of the KD metric is based purely on the available data and hence, it is not restricted to any particular distribution. This enables it to be applied to any type of distribution that includes Gaussian, non-Gaussian or exponential distributions. The KD metric between two distributions is calculated using a segmentation process. Since comparison of individual observations in source and destination distributions is a time-consuming process, the data in both distributions are divided into multiple segments. Next, each segment from source distribution undergoes comparison with all the segments in the destination distribution. The division of data into different segments depends on the moving window parameter, which is selected based on a particular application. The computation of the KD metric between distribution Q and distribution R is presented in Algorithm 1.

A. KD STATISTIC THRESHOLD
In fault detection problems, statistical indicator is compared with reference threshold to determine the presence of fault. In earlier studies on KD-based fault detection, a simple threesigma rule was used to compute the threshold for KD indicator [41]. The training data were initially split into two parts Tr1 and Tr2. Next, a multi-variate model was developed for both data-sets and the residuals Ru1 and Ru2 were generated. The KD metric KDA was computed for Ru1 and Ru2. Next, the threshold α was computed using the following expression [39]: where µ KDA and σ KDA represent the mean and standard deviation of KD metric KDA. However, the three-sigma threshold rule may not be an efficient option in all the scenarios. In few process applications, the data could be Gaussian or non-Gaussian in nature and in such cases, a better approach may be required for determining the threshold if the KD statistical indicator were to be applied. In this study, kernel density estimation (KDE) approach is proposed to be used as a threshold for KD-based fault indicator. KDE refers to the class of data-driven techniques for the non-parametric estimation of density functions. KDE is a powerful tool that can determine empirical distribution density function from the given samples from the population under consideration. The advantage of using the KDE approach in determining the threshold is that it follows the data more closely and gives very less prominence to the region of unknown operation [18]. The KDE approach has been used as a threshold for many fault detection problems in the literature [47], [48]. For a given random variable y, its PDFf (y) can be estimated by n samples y i , where K is the univariate kernel estimator, n is the number of observations and h is the smoothing parameter. The kernel estimator is the sum of the bumps that are placed at different sample points. The smoothing parameter needs to be chosen carefully because small value makes the signal less smooth and a large value will oversmoothen the signal [47], [49]. The training data are initially split into two parts Te1 and Te2. Next, the KPCA multi-variate model is developed for both data-sets, and the residuals Re1 and Re2 are then generated. The KD metric KD is computed between Re1 and Re2. This is followed by the estimation of the density function of the KD statistic using uni-variate kernel density estimator presented in equation (19). The point occupying 99% of the total area in the density function plot is treated as the threshold limit for KD statistic.

IV. FAULT DETECTION USING KPCA-BASED KD STRATEGY
The primary objective of this work is to develop an enhanced fault detection scheme based on the KD metric to improve the detection of sensor faults. The conventional fault indicators of PCA and KPCA based fault detection schemes have a few shortcomings, which makes the detection process less efficient. Hence, an alternative fault indicator option in the form of KD index is considered in this study. The residuals of any developed multi-variate model contain important information related to the process. For a new data X, the residuals from the reference KPCA model are computed using the following expression: where v p denotes eigen vectors corresponding to p optimum PCs. If the process is having some abnormalities, information regarding abnormality will be reflected in the residuals. The proposed KD-based monitoring compares the residuals of training fault-free data and the testing data. The residuals of both datasets are evaluated using the KD metric to indicate the presence of abnormality. The proposed fault detection scheme is presented through block diagram format in Figure 1. The proposed scheme is divided into offline development and online monitoring stages, which are briefly described below: 1) Offline development stage: VOLUME 10, 2022 • Step-1: The data under normal operating conditions of the plant are recorded and scaled to zero mean and unit variance. • Step-2: The data are projected on to higher dimensional feature space where kernel PCA model is developed, and dominant components are selected using the CPV approach. • Step-3: The residuals R1 are generated for the scaled data from the KPCA model parameters. • Step-4: The data are split into two equal units and then, the KD statistic is computed between the two units and the estimation of the density function of KD statistic is used to determine the threshold α. 2) Online monitoring stage: • Step-1: A new online data (possible faulty data) are scaled to zero mean and unit variance. • Step-2: The residuals R2 are generated for new online data from the reference KPCA model. • Step-3: The KD statistic KD is developed between R1 and R2 using a moving window of fixed length as presented in Algorithm 1. • Step-4: The online data is free of faults if KD < α and deemed to be faulty if KD > α. The performance of any FD strategy involving the KD statistic depends on the choice of the moving window parameter. The choice of this parameter depends solely on the type of process data and the amount of noise content in the data. More details regarding the selection of this parameter can be found in [40].

V. RESULTS AND DISCUSSION
To confirm the effectiveness of the proposed KPCA-KD fault detection scheme, three applications are considered: continuous stirred tank reactor process, Tennessee Eastman process and experimental distillation column process. This study compares PCA-T 2 , PCA-SPE, PCA-KD, KPCA-T 2 and KPCA-SPE strategies with the proposed KPCA-KD based strategy. The fault detection rate (FDR) and false alarm rate (FAR) metrics are used for a fair comparison between different strategies [41].

A. CONTINUOUS STIRRED TANK REACTOR (CSTR)
A nonlinear continuous stirred tank reactor (CSTR) is considered to validate the performance of proposed KPCA-KD FD scheme. The CSTR problem has been used in many fault detection based problems over the last few years [29], [50], [51]. The CSTR process involves a non-isothermal, irreversible first order reaction of the form: where A is the reactant species, and B is the desired product. A schematic of the CSTR process considered in this study is shown in Figure 2. The reaction system is highly exothermic and therefore, the fluid in jacket of the reactor is used to cool the reactor. The feed flow rate and coolant flow rate is regulated to obtain the desired product concentration, that is, B. The prediction ability of the model is used to predict the concentration, that is, CA. The reaction mechanism is presented as per the Equation (21). By writing the component balance and energy balance around the reactor system, the set of model ODE equations is shown in (22) to (24). The parameters of the model are presented in Table 1.

1) DATA GENERATION
The data is generated for the above reactor example by perturbing the flow rate of the feed stream and coolant flow rate from the nominal steady state condition using the pseudo random binary signal (PRBS) in the frequency range of [0 0.05 w N ], where w N = π/T represents the Nyquist frequency. The measurement of the reactor concentration and reactor temperature are assumed to be corrupted with zero-mean Gaussian white noise sequence with a standard deviation of 0.02 and 0.5 respectively. The inlet reactant concentration of A (C Ao ), inlet temperature of the reactant (T o ) and inlet temperature of the coolant (T cin ) are treated as unmeasured disturbances and it is further assumed that their dynamics are governed by the following stochastic processes: where {w 1 (k)}, {w 2 (k)} and {w 3 (k)} are white noise sequences with standard deviation of 0.5, 1 and 1, respectively. A total of 1390 observations with 7 variables were obtained which were later split equally to obtain 745 samples of training and testing data respectively. The PCA and KPCA models were  developed with four optimum PCs retained using the CPV approach. For the KD computation between two residuals, 50 was the size of the selected moving or sliding window. The monitoring of different sensor faults using the proposed KPCA-KD based approach is presented in next section.

2) MONITORING RESULTS
This section provides the confirmation of the proposed KPCA-KD fault detection approach in successfully monitoring the sensor faults in CSTR process. The        [200,300] and [500,600] thereby reducing the detection rate. Among all the methods, the KPCA-KD FD scheme performs very well with precise detection in faulty region. The proposed FD scheme ensures that there are no missed detections in the region of the fault and no false alarms in the fault-free region.

5) CASE 3: FAULT 5
The considered fault is sensor drift or aging fault which is introduced in the reactor temperature variable between sampling time instant 300 and the end of the testing data. The monitoring results for PCA-T 2 , PCA-SPE and PCA-KD based schemes in monitoring this fault are shown in Figure 7 while that of KPCA-T 2 , KPCA-SPE and KPCA-KD based schemes are shown in Figure 8. The monitoring results for different fault scenarios evaluated using FDR and FAR can be observed in Table 3 and  Table 4. It is observed that the faults 1 and 3 are of larger magnitude and hence, all monitoring methods can easily detect the fault with good FDR values. For faults with smaller magnitude, as observed in faults 2 and 4, the traditional statistical indicators fail to detect owing to the limitations that were discussed in section 1. In contrast, the proposed KPCA-KD FD scheme performs far better than the other methods, thus having a high FDR and lower FAR value as observed in the result table. Even for the faults 5 and 6, which are sensor drift faults, the performance of the KPCA-KD FD scheme is better than other methods. The KD statistical indicator is computed in a moving window involving segment-by-segment comparison, which enables it to capture sensitive details from the process data and enhance the detection of faults.

B. TENNESSEE EASTMAN PROCESS
The Tennessee Eastman (TE) process problem has been regarded as a platform for multifarious process control and fault detection tasks in the last few decades. Most researchers in the multivariate process monitoring domain validated their newly proposed fault detection strategy on this benchmark process [53], [54] [55]. Flow diagram of the benchmark TE process is shown in Figure 9. In this process, two products and one by-product are generated by four gaseous reactants. It comprises 22 process measurements (XMEAS 1 to XMEAS 22), 19 composition measurements (XMEAS 23 to XMEAS 41) and 12 manipulated variables (XMV 42 to XMV 52) thus, comprising a total of 52 variables. The process flowsheet has 21 real fault scenarios (IDV(1) to IDV (21)) and it includes bias, drift, intermittent and valve stuck related abnormalities. Post normalization of training data, PCA and KPCA models were developed with 39 and 34 optimum PCs retained using the CPV approach. A moving window of 50 is selected for the PCA-KD and KPCA-KD based strategies. Three fault scenarios, namely, IDV(3), IDV (9) and IDV (15) have been excluded in this work since they give very poor FDR values [56]. The fault scenarios comprising of IDV(3), IDV (9), and IDV (15) were excluded in the study because they yield small FDR values [56]. In all the fault scenarios presented in the TE process data-sheet, faults were introduced between sampling time instants 160 and 960 of the testing data.
To confirm effectiveness of the KPCA-KD based FD scheme, its performance was validated on different fault scenarios of the benchmark TE process. The performance of the proposed method and its contemporary methods in monitoring various faults through FDR and FAR indices is tabulated in Tables 5 and 6. The results table indicates the superiority of the KPCA-KD method since it has a good FDR and minimum FAR value for all fault scenarios in comparison with the contemporary methods. This is mainly because the KD metric can capture sensitive details in the process data. For large magnitude faults such as IDV(1), IDV(2), IDV(6), IDV(7), IDV (8), IDV (14), IDV (17), and IDV (18), all the FD methods can easily detect the faults. For fault scenarios IDV(10), IDV (19) and IDV (21), the performance of all FD strategies is reasonably good and even in these cases, the proposed method has a good advantage in terms of FDR and FAR. From the monitoring results, the proposed KPCA-KD  scheme has superior advantages for three scenarios: IDV(5), IDV (11) and IDV (20) as compared to the other methods. For providing clarity to the reader, the monitoring performance of two fault scenarios is presented in detail. The fault scenarios considered are IDV(5) which is a step anomaly in the condenser cooling water inlet temperature and IDV(11) which is a random variation fault in the reactor cooling water inlet temperature.
The result plots of the PCA and KPCA based methods for monitoring IDV (5) fault are presented in Figure 10 and Figure 11 respectively. The PCA-T 2 , PCA-SPE, PCA-KD, KPCA-T 2 and KPCA-SPE based methods detects the fault between samples 160 and 380. After a sampling time instant 380, these methods are unable to detect the fault which eventually results in a poor detection rate. In contrast, the KPCA-KD scheme detects the fault clearly in sampling time instants 160 to 960, thus having a high FDR in comparison to other FD schemes. It can be observed that the KD indicator exceeds the reference threshold correctly in the faulty region. The performance of PCA and KPCA based methods in monitoring IDV (11) fault is presented in Figure 12 and Figure 13 respectively. The PCA-T 2 , PCA-SPE, PCA-KD,   KPCA-T 2 and KPCA-SPE based methods are able to partially detect the fault because the value of FDR is less vowing to the missed detections in the fault region. In contrast, the proposed KPCA-KD FD scheme detects the fault precisely with a high FDR value, thus, it has a superior advantage over other methods. Hence, it can be inferred that the KPCA-KD  FD scheme provides better detection of faults with superior FDR values and minimum FAR values, thus, satisfying the properties of a good fault detection scheme.

C. EXPERIMENTAL DISTILLATION COLUMN PROCESS
An experimental distillation column (DC) process is considered in this section to confirm the effectiveness of the proposed KPCA-KD based FD scheme. A distillation column is an energy-consuming unit operation in the process industry. The experimental distillation column considered in this study is housed in the Department of Chemical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. A pictorial view of the bubble cap distillation column can be observed in Figure 14. The construction details of the distillation column can be found in [57].

1) DATA GENERATION
The data are generated by sequentially perturbing the reflux and feed flow. The distillation column was brought to a nominal operating condition and then, the feed flow was perturbed with a magnitude of 50, while maintaining the reflux flow constant at nominal condition. Again, the distillation column is brought to the nominal condition followed by perturbing the column with reflux flow with a magnitude of 40 and keeping the flow rate constant. The resultant change in the product quality (output) was recorded as x D along with six temperatures to monitor the condition of the column. In model development stage, along with the flow rate and feed flow, six temperatures with 2048 observations were included in the input matrix. Next, the generated data were split equally into training and testing data groups. To have a fair comparison, 6 optimum PCs are selected for both PCA as well as KPCA based fault detection strategies during the model development stage.

2) MONITORING RESULTS
The efficacy of the proposed FD scheme is assessed by its ability to monitor bias, intermittent and drift faults. First, a sustained bias was introduced in the temperature variable 2 in between sampling time instants 400 and 1024 of the testing data. The results of PCA and KPCA based fault indicators for monitoring this fault are presented through Figures 15 and 16. The PCA-T 2 and PCA-SPE based fault       the KPCA-KD FD scheme can clearly detect the bias fault with a good detection rate. Hence, it can be inferred that the KPCA-KD scheme outperforms other schemes with improved detection performance. An intermittent fault are introduced in the temperature variable 4 at sampling time instants [250,350] and [750,850] of testing data and even in this case, the KPCA-KD scheme was found to have enhanced detection performance.
Next, a drift fault was introduced in the temperature variable 3 between sampling time instants 300 and 1024 of the testing data. The monitoring results of the PCA and KPCA-based indicators for monitoring this fault are presented in Figure 17 and Figure 18 respectively. From the monitoring plots, it was observed that all monitoring strategies were able to detect the drift fault. The PCA-T 2 and KPCA-T 2 based schemes detect the fault with large detection delay in comparison to PCA-SPE, PCA-KD, KPCA-SPE and KPCA-KD based FD schemes. The proposed KPCA-KD FD scheme carries a small advantage because it shows early detection of drift fault in comparison with other schemes. The FDR and FAR of the proposed KPCA-KD FD strategy and corresponding FD strategies are presented in Table 7 and Table 8 respectively. The tables clearly indicate that the KPCA-KD strategy is better equipped for handling the sensor faults in experimental DC process with high FDR values and less FAR values.

VI. CONCLUSION
In this study, a new and effective KPCA-KD based FD scheme was developed for improved and efficient detection of sensor faults. The addressed fault detection problem modeled the data using the KPCA framework and utilized the capability of the KD statistical indicator for fault detection. The KD statistical indicator evaluated the distance between the KPCA residuals of normally operating data and the residuals of the new online data. The KDE approach was used to compute the reference threshold for KD statistical indicator. The efficacy of the proposed KPCA-KD based FD strategy was illustrated using the CSTR process, benchmark TE process and experimental distillation column process. The KPCA-KD fault detection was contrasted against PCA-T 2 , PCA-SPE, PCA-KD, KPCA-T 2 and KPCA-SPE based FD schemes. Since the KD indicator was computed in a moving window involving segment-by-segment comparison, sensitive details from the process were captured easily which enabled the enhanced detection of faults. In the case of the TE process, the proposed FD strategy was able to successfully detect the IDV(5) fault which is generally not detected clearly by traditional PCA and KPCA based FD schemes.
Overall, the KD metric was able to precisely detect small magnitude faults with a smooth detection profile. Hence, it can be concluded that the KPCA-KD scheme clearly emerges as a better choice for formulating the FD strategy when compared to traditional fault indicators. As a part of future work, we plan to amalgamate wavelet functions with Kernel PCA scheme to have a novel kernel multi-scale PCA based fault detection strategy where the multi-scale Kernel PCA strategy will be used as modeling framework and the KD metric as fault indicator. Since the process data comes with large amount of noise, one possible way to reduce the effect of noise is by multi-scale representation of the data using wavelet functions. Additionally, it is also planned to perform the fault diagnosis to determine the variable or set of variables that were responsible for the fault. This can be achieved by appropriately evaluating the residuals from the KPCA model using contribution plots.