Improvement of Kernel Principal Component Analysis-Based Approach for Nonlinear Process Monitoring by Data Set Size Reduction Using Class Interval

Fault detection and diagnosis (FDD) systems play a crucial role in maintaining the adequate execution of the monitored process. One of the widely used data-driven FDD methods is the Principal Component Analysis (PCA). Unfortunately, PCA’s reliability drops when data has nonlinear characteristics as industrial processes. Kernel Principal Component Analysis (KPCA) is an alternative PCA technique that is used to deal with a similar data set. For a large-sized data set, KPCA’s execution time and occupied storage space will increase drastically and the monitoring performance can also be affected in this case. So, the Reduced KPCA (RKPCA) was introduced with the aim of reducing the size of a given training data set to lower the execution time and occupied storage space while maintaining KPCA’s monitoring performance for nonlinear systems. Generally, RKPCA reduces the number of samples in the training data set and then builds the KPCA model based on this data set. In this paper, the proposed algorithm selects relevant observations from the original data set by utilizing a class interval technique (i.e. histogram) to maintain a bunch of representative samples from each bin. The proposed algorithm has been tested on three tank system pilot plant and Ain El Kebira Cement rotary kiln process. The proposed algorithm has successfully maintained homogeneity to the original data set, reduced the execution time and occupied storage space, and led to decent monitoring performance.


I. INTRODUCTION
Industrial plants may suffer from faults at some point during their working time, affecting the production process.Because of this, monitoring systems are implemented to detect those faults and to avoid major casualties which raises the importance of ensuring their reliability and robustness.
The associate editor coordinating the review of this manuscript and approving it for publication was Xiwang Dong.
PCA is a data-driven FDD technique, it lowers the number of variables in a given data set by mapping this data into a new lower-dimension space such that variables in this space possess most of the original data set variability, the basis of this space being a set of linearly independent vectors [1], [2].PCA does this mapping by computing eigenvalues and eigenvectors of the input data set covariance matrix [3].PCAbased FDD technique becomes an attractive technique for its flexibility and simplicity derived from the Single Values Decomposition (SVD) problem [4].Although PCA has a fine reputation and is widely used, it does not have the ability to keep up its performance when data under study has nonlinear characteristics as most industrial processes [5].Many techniques have been proposed to overcome this kind of obstacle.Dong et al. [6] proposed principal curves PCA which merges both neural network and principal curves algorithm, Jia et al. [7] proposed a novel nonlinear PCA based on input training network and by defining non-parametric control limits for FDD, Geng et al. [8] proposed adaptive nonlinear PCA based on improved input training neural network [4], Scholkopf et al. [9] proposed KPCA that executes conventional PCA in higher dimensional feature space.
KPCA is a nonlinear extension of the PCA technique.It maps a given data set from input space to a higher dimensional feature space, Hilbert space, using kernel trick without involving a mapping function, and then it executes PCA in this new space [9].KPCA is widely used for its simplicity compared to other nonlinear PCA techniques because it is based on solving eigenvalues problem just as conventional PCA, furthermore, no optimization is needed to achieve good results [9].Because of its ability, KPCA consumes more computational time and occupies more storage space than the conventional PCA technique and may lose its accuracy if the data set is too large.Thus, KPCA may be disadvantageous when the training data set has a very large number of observations and eventually worthless when developing expensive computers [4], [9], [10].Reduced KPCA (RKPCA) is introduced to overcome the drawbacks of the conventional KPCA technique.RKPCA uses a reduced number of observations while retaining as much useful information as the original training data set, then the KPCA model is built upon this reduced data set, helping to decrease the effect of the large-sized data set on the performance of the conventional KPCA [11].
To use conventional PCA, KPCA, and RKPCA algorithms for fault detection and diagnosis, it is required to use Statistical Process Monitoring (SPM) which is based on data-driven models by taking a number of measurements from different variables under healthy operating conditions of a system, in other words, it is based on monitoring the variations in a given data set.These variations can be categorized as common variations such as natural variations in a process and special variations such as faults [12].Different statistics in SPM are used as fault detection indices, like the Hotelling (T 2 -statistic), the Square Prediction Error (SPE or Q-statistic), and the combined index (ϕ-statistic).The T 2 -statistic evaluates variations in principal components subspace, whereas the Q-statistic evaluates variations in residual subspace [13].For each SPM data set, These fault detection indices are evaluated using different metrics which are: False Alarm Rate (FAR), Missed Detection Rate (MDR), and Detection Time Delay (DTD).
The proposed algorithm is based on the RKPCA technique where the retained observations are selected using class intervals similar to the histogram of the 1 st PCA's principal component score.A histogram is a bar-plot type that groups data into commonly equally spaced intervals called bins on the horizontal axis, and the corresponding appearance frequencies on the vertical axis [14].The idea of the proposed algorithm is to select a number of representative observations from each bin to form the reduced data set which is used to build the KPCA model for fault detection purposes.This study aims to form a reduced data matrix that maintains homogeneity to the original data set, reduces execution time and the occupied storage space, and manages to attain excellent FDD results.The reduction part of the proposed RKPCA algorithm is based on a class interval that does not require either optimization or predefined values.This work is proposed to create an RKPCA algorithm that can effectively reduce the number of observations in a large-sized training data set and enhance monitoring performance.In other words, the main contribution of this paper is the development of a reliable and effective method for selecting the most relevant observations using a class interval scheme while preserving homogeneity to the original data set.It is well-defined and duly analyzed theoretically.It improves RKPCA-based fault detection performance.Consequently, it reduces significantly both execution time and the needed storage space.Although the proposed algorithm is effective and easy to utilize, it is up to the user to select the appropriate reduced data matrix.As a numerical example, the developed algorithm was tested using a tank system data set and compared to both PCA and KPCA techniques, then it was applied and compared to other algorithms using a real industrial data set obtained from Ain El Kebira cement plant rotary kiln.
This article is organized as follows: Section II introduces some of the related work, section IV gives a literature review on PCA and KPCA for FD, section V introduces the proposed algorithm, section VI gives a brief description of the processes used in this paper, section VII shows results and gives discussion about them, and section VIII gives a general conclusion.

II. RELATED WORK
In this section, some of the related works are presented.Most of the existing algorithms follow almost the same idea which is reducing a large-sized data set by omitting redundant and/or correlated samples.The difference between those algorithms is how to determine the samples to be omitted and the effectiveness of the reduction method.Felipe et al. [15] introduced an RKPCA algorithm based on the k-means clustering algorithm which needs an appropriate predefined number of clusters in order to have good results.Harkat et al. [16] proposed an RKPCA algorithm that uses PCA as data reduction on the transposed matrix of the input data set to select the uncorrelated observations, since the reduction is based on the PCA algorithm it may be affected by the size of the covariance matrix which means it will drop its accuracy for very large data size, the reduced data matrix using PCA approach contains the de-mapping observations from PCA's principal components subspace to input space thus the new observations are slightly different from original ones.Lahdhiri et al. [17] presented an algorithm that keeps observations for which the kernel matrix has full rank, unfortunately this method retains a very large number of observations in the reduced matrix.Taouali et al. [18] introduced an algorithm that keeps observations that point to the largest variances with retained principal components of the kernel matrix of the total observations, the reduction part of this algorithm is highly related to the selected principal components so in order to have the best overall reduced data it can take a lot of time to obtain since for large data sets the number of principal components choices is large.Bencheikh et al. [4] proposed an algorithm that keeps observations with a given Euclidean distance between them so the homogeneity between the original and reduced data is not guaranteed.In [19], the authors use an RKPCA algorithm based on a variogram as a reduction technique.unluckily, the reduced data matrix is still quite large nearly 70% of the original data set because it deals only with spatial continuity.Thus, this approach highly depends on the smallest selected lag which in some processes is small and less than half of the number of samples, making the consequential data matrix the same as the original one and the user will not have any reduction in the data set.More recently, Kaib et al. in [20] introduced a reduction technique based on the fractal dimension which represents the intrinsic dimension of a large-sized training data set and is implicitly defined.Thus, the homogeneity of effectively reduced data sets is interpreted concisely and cautiously, as the power of statistical tests available for this purpose might be limited.Furthermore, this method is only applicable to chaotic systems.
The proposed algorithm tends to reduce the size of the training data set using the class interval, histogram, while maintaining homogeneity between the two data sets, this can help the proposed algorithm to achieve decent overall performances.

B. MATHEMATICAL NOTATIONS
• Â: Principal Components of A.
• R d : Space of dimension d.
• ∥v∥: Euclidean norm of the vector v.
• X T : Transpose of matrix X .

C. PARAMETERS AND HYPERPARAMETERS
• 2δ 2 : Hyperparameter or (gamma) of the Radial Basis Function.
• σ 2 : variance of the input data set.
• N B : Number of Bins in the proposed algorithm.
• ν: Appearance frequency PCA is known for its simplicity and low execution time value, it decomposes inter-correlated variables into a principal component set of variables [21].
Let X n×m o be the data set matrix collected from the normal operating system of n observations from m sensors.Before proceeding with the conventional PCA, this data set matrix is normalized to zero mean and unit standard deviation to obtain X n×m .Equation (1) demonstrates how the covariance matrix, , is computed and decomposed using Singular Value Decomposition (SVD).
The columns of P m×m are known as loading vectors and is a diagonal matrix containing the eigenvalues of in descending order, [λ 1 . . .λ m ].The scores matrix, T , is then given by There exist different methods for the selection of the number of Principal Components (PCs) to be retained.The Cumulative Percentage Variance (CPV) is a widely used method, it selects the number of PCs, l, such that the sum of their variances, CPV(l), is greater than or equal to a predefined value CPV limit [16].It is given as follows After selecting PCs, the loading vectors can be decomposed into Pm×l , which represents the principal loading vectors, and Pm×(m−l) that is the residual loading vectors.
In the same way, the scores and eigenvalues matrices are decomposed as in equations ( 5) and ( 6), respectively.
F α (l, n − l) represents Fisher-Snedecor distribution with l and (n − l) degrees of freedom, and α is the significance level.
where σ 2 Q is the variance of Q-statistic, and µ Q is its mean, and χ 2 is the chi-square distribution.
where g and h can be found in [22].
For each statistic, a fault is reported when its value exceeds its corresponding upper control limit.

B. KPCA
The main idea behind the KPCA algorithm is to perform PCA in feature space, F, using kernel trick instead of the mapping function φ [23].
Let X n×m be normalized data set to zero mean and unit standard deviation, then the mapping function φ is defined as the following: where x i is a row vector from X n×m and f ≫ m.By making the assumption that the vectors in feature space are scaled to zero mean and unit variance, mapped data can be expressed as [9], the covariance matrix in feature space, F, is then computed as the following: As property of this feature space is that the dot product φ (x i ) .φ x j can be computed using kernel trick (kernel function) as the following [9]: There are different types of kernel functions, but the most used one is the Radial Basis Function (Gaussian) [19], and it is given as: where the hyperparameter 2δ 2 can be found using the formula 2δ 2 = 2rmσ 2 , such that r is empirically obtained, and m and σ 2 are the number of variables and the variance of the data set matrix, respectively [23].
As conventional PCA, KPCA tends to solve the eigenvalues problem defined as in equation ( 17): Since the mapping itself is not known, the Gram matrix T can be computed using kernel trick as in equation (15).with the use of this trick, the matrix K = T is defined as the following equations: The retained PCs are selected by the CPV method as shown in equation (3).
For FDD, the monitoring indices T 2 , Q, and ϕ-statistics are used.The way to compute them is shown in equations (19), (20), and ( 21) respectively.The Hotelling T 2 -statistic is given as the following: where Pφ is the principal eigenvectors of K , and ˆ φ diagonal elements are the principal eigenvalues of K .The upper control limit of this statistic is given using equation (10).Q-statistic is obtained by: Its upper control limit is given as in the conventional PCA by equation (11).For both equations ( 19) and (20 ϕ-statistic is computed as: The upper control limit is given by equation (12).
For each of the aforementioned fault detection indices, three different monitoring performance metrics are evaluated.The False Alarm Rate (FAR) represents the rate of normal samples that exceed the control limit of a given fault detection index during the system normal operation conditions, it is described by equation (22).The Missed Detection rate (MDR) characterizes the faulty samples that don't exceed the control limit under defective operation conditions, it is described as in equation (23).The Detection Time Delay (DTD) is the number of samples between the appearance of a fault and its detection, it is described by equation (24).
NF is the number of non-faulty samples that exceed the control limit, NOC is the number of total non-faulty samples, FN is the number of faulty samples that didn't exceed the control limit, FOC is the number of total faulty samples, t D is the sample of the detection of a fault while t O is the sample of its occurrence.

V. PROPOSED RKPCA ALGORITHM
Histogram, in statistics, is a type of bar graph that displays the value of appearance frequency of specific data within a given bin, so it helps to visualize data distribution and its skewness [24].The width of the bins is controlled by the number of bins used as shown by the following equation: where B w is the bin width, M V is the maximum value in data set, m V is the minimum value in data set, and N B is the number of bins.The y-axis can represent different types of values, such as appearance frequencies, probabilities, and percentages [24].The proposed algorithm aims to reduce the size of large-sized data while preserving their most characteristics.It starts by computing the PCA model, using algorithm 1, for the original training data set and then selects 1 st principal component score, because it contains the most information explained by its highest variance.After that, a histogram of this 1 st principal component is plotted with a specified number of bins N B .Lets define ε as the minimum appearance frequency, ν i as the appearance frequency of the i th bin, and γ i as shown in the following equation: All the appearance frequencies will be scaled to the minimum appearance frequency ε such that the minimal frequency becomes 1 or 2 according to the frequency in this bin, and each other bin i has γ i as frequency.The median is then taken as representative observations of a bin.The choice of having the same median ensures that the remaining samples are distributed through all the bins.After this, the proposed algorithm selects the corresponding rows in the scores matrix and performs the inverse of PCA mapping to obtain the reduced data matrix X r with fewer observations than the original data set.Finally, use this data matrix to build the KPCA model, using algorithm 2, and test this model using faults described in section VI and loss functions (28,29), these loss functions tend to minimize the number of abnormal samples in FDD system.The proposed algorithm does not require optimization in the size reduction part because only a few values of N B can lead to ϵ > 1 and the size of X r is directly related to ϵ.
For the normalization of the kernel matrix K and testing kernel matrix K t , 1 n is equal to 1  n time a (n × n) square matrix of ones, the same thing goes for 1 t the only difference is the matrix of ones is now a (t × n) matrix where t is the number of testing samples.The first value of N B is chosen such that ε = 1, which means in this case that the reduced data matrix is the same as the original one and then it is decreased for larger ε values and repeat using smaller values of N B to obtain matrices with fewer observations.After that, the performance of each resulting reduced data matrix is computed to pick the one with the best performance and having fewer observations than the original data set as shown in Tables 1 and 5. Algorithm 3 illustrates how the proposed algorithm works.The loss function presented in the following equation ( 27) aims to reduce the MDR, DTD, and FAR values for a given monitoring index which means that the lower the value of J s is the better the monitoring metrics values are and vice versa.Change the number of bins, N B , and repeat until a satisfying result is obtained.Algorithm 3 illustrates how the proposed algorithm works.
In order to have the same effect of different indices over the loss function, the different weighting factors are selected as the following a 1 = a 2 = a 3 = 1.Furthermore, the algorithm performance is duly considered in assessing monitoring performance overall.
where Et r and So r represent the execution time for one testing sample from the testing part (online) and the occupied storage space of the training part of algorithm 2 using the reduced data matrix X r , respectively.Similarly, Et n and So n are the same as Et r and So r only this time the original training data is used.The cost function (29) takes into account the overall monitoring performance, the execution time, and the needed storage space all at once.For a lower values of J O , the algorithm is expected to have a decent monitoring performance and require less execution time and less storage space for the model that is the subject of this study.The proposed algorithm can improve the performance of RKPCA because its purpose is to eliminate redundant observations and keep homogeneity between original and reduced data sets.As mentioned in [9], the size of the Gram matrix K can affect the accuracy of the KPCA's performance if it is a large-sized matrix.-Compute kernel matrix K using kernel function in equation ( 16).
-Scale the kernel matrix K using the following equation: -Solve the eigenvalues problem defined in equation ( 17).
-Calculate different statistics used for fault detection and their limits.

Testing Part:
-Acquire testing samples.
-Scale these samples using the mean and standard deviation of the original data set.
-Scale the obtained kernel matrix using the following equation: statistics, and report a fault if one of them exceeds its limit.-Perform the inverse scaling to obtain X r .
-Use X r to build KPCA model using algorithm 2.
-Compute the value of the loss function J O in equation (29) to evaluate the performance of the model.

VI. PROCESS DESCRIPTION A. THREE TANKS SYSTEM DTS-200
This system corresponds to the plant made by Amira GmbH that includes three serially connected tanks (cylinders) and two pumps to fill two of the three tanks [25], [26].Five sensors are used for this application, they are placed as the following: (i) Level sensor at each of the three tanks.
x 1 , x 2 , and x 3 are the measured variables from these sensors.(ii) Flow meters for each pump.u 1 and u 2 are the measured values.Figure 1 shows the diagram of this system.The data acquisition process is explained in [26].Data sets of this system are organized as (i) Healthy training data set with a matrix size of 9000 × 5. (ii) Faulty data set with fault of 10% in the first tank level sensor with 2400 observations.(iii) Faulty data set with fault of 10% in the second tank level sensor with 2400 observations.For further explanation of the plant please refer to [26] and [27].

B. AIN EL KEBIRA CEMENT PLANT
Cement production is a complex process that starts by mining and then grinding raw materials including limestone and clay to a fine powder, called raw meal, which is then heated to a sintering temperature as high as 1450 o C in a cement kiln to broke the chemical bonds of the raw materials and then they are recombined to form new compounds.The result is called clinker, which is grounded to a fine powder in a cement mill and mixed with gypsum to create cement.
The Ain El Kebira cement plant is located near Setif in eastern Algeria. it has a rotary kiln of 5.4 m shell diameter and 80 m length with 3 o incline.The kiln is spun up to 2.14 rpm using two 560 kws asynchronous motors and the producing clinker of density varying from 1300 kg.m −3 to 1450 kg.m −3 under normal conditions.Two natural gas burners are used, the main one in the discharge end and the other one in the first level of the pre-heater tower without any tertiary air conduct.
Table 2 in paper [4] illustrates different variables of the process and a schematic diagram of this plant is presented in [28].Data sets used for this work are as follows: (i) training data set with 768 observations.(ii) Testing data set with 11000 observations.(iii) Real process fault with 2048 observations.It includes the normal operating conditions part that last less than 420 samples, and the rest is the faulty part which constitutes multiple faults in various variables that increase gradually [30].(iv) 10 simulated sensor faults data sets with each one lasting for 1000 observations these faults are described in [19].VOLUME 12, 2024

VII. RESULTS AND DISCUSSION
For the two study cases, the computer used has Intel(R) Core(TM) i3-3217U CPU @ 1.80 GHz and 4 Go of RAM.Besides, the significant level α is selected to be 1%.
First, algorithm 3 is applied to the three tank system described in section VI.Table 1 exhibits some of the reduced matrices that are obtained using the proposed reduction method and their performance based on different FDD statistics and indices described in section IV.It can be noticed how the number of bins, N B , affects the least appearance frequency, ε, which determines the size of the reduced data matrix, the higher it gets the smaller the reduced data matrix The first reduced data matrix is the same as the original data set because ϵ = 1 and its performance is the same as the conventional KPCA.In this table, the comparison between different reduced data matrices is based mainly on the cost function described by the equation (29).Some of the reduced matrices have also better performance than conventional KPCA in terms of the different indices alone described by equation ( 27) as shown in columns (4, 5, and 6) but for this study the data matrix X r with a number of observations of 1007 is chosen due to its overall performance based on cost function J O . Figure 2 and Figure 3 show the histogram obtained from the first PC score of both the original data set and the selected reduced data matrix, respectively.As can be seen from both figures, both histograms have the same shape and skewness for the same bins.Furthermore, Table 2 is an illustration of how the figures 2 and 3 are alike by exhibiting the appearance frequencies and relative frequencies of both histograms.The relative frequencies from both histograms are close to each other which explains why both histograms have the same distribution.Hence, these results show that the proposed reduction method has successfully maintained the same distribution in the direction of the highest variance of the data.
Table 3 illustrates the comparison between conventional PCA, conventional KPCA, and the proposed RKPCA algorithm with the chosen reduced data matrix.This comparison is held upon the cost function described by equation (28).As anticipated, the PCA algorithm failed to perform as well as the KPCA and RKPCA algorithms due to nonlinearities.The proposed algorithm managed to perform as the conventional KPCA because it has only kept useful observations which are 11.19% of the total number of observations.Moreover, the proposed algorithm has kept homogeneity for all reduced matrices presented in Table 1 based on the test presented in [29].These results manifest the potential of the proposed algorithm which leads to applying it to cement plant data and then comparing it to some of the existing RKPCA algorithms.
Table 4 shows the gain obtained using the proposed algorithm instead of the conventional KPCA in terms of the execution time and occupied storage space, taking into account that the KPCA algorithm and the RKPCA algorithms usually follow O(n 3 ) time complexity and O(n 2 ) for the storage space complexity and the difference in execution   time and required storage space between RKPCA and KPCA algorithms is a result of reducing the number of observations from the large-sized training data set.The time gained  in this table is the required time to evaluate one sample in the online part of the KPCA algorithm.The gained time is then given as (1 − (Et r /Et n )) × 100(%).The same thing goes for the gained storage space, it is given as (1 − (So r /So n )) × 100(%).The gained storage space for the proposed algorithm is 98.84%.
Table 5 illustrates some of the that are obtained by performing algorithm 3 on cement plant data set using different faults and real process fault mentioned in section VI, all the different statistics and metrics introduced in section IV are used for FD.As it is seen in Table 5, again the number of bins N B affects the least appearance frequency ε which determines the reduced data matrix.The cost functions used in this table are defined by equations (27, 28, and 29) but the one used to select the appropriate reduced data matrix is described in equation (29) because it is based on the overall monitoring performance and takes into account the gained time and storage space complexities.For the T 2 index the reduced data matrix with 199 samples has the best performance, and for the Q index the reduced data matrix with 263 samples outperforms other reduced data matrices, whereas, for the combined index ϕ two reduced data matrices with 261 and 157 samples have the best performance.For the overall monitoring performance only based on J , data matrices with the size 389, 261, and 157 samples have the best performance.From these data matrices, the selected one is the one that leads to the best performance based on J O which is the data matrix with only 157 samples or one-fifth of the total number of observations.The reduction part has successfully maintained only 20.44% of the total number of observations.
After the selection of the data matrix X r , Table 6 and figures 4 and 5 demonstrate the distribution of the original data and reduced data as well to compare and see how close they are.From figures 4 and 5, it can be noticed that there is a slight difference between the appearance frequency of the original data set and the one from the reduced data set, this is due to the redundant observations from the data set because this data was collected from real plant and one can notice that both figures have the same skewness and nearly the same distribution.on the other hand, Table 6 helps to   understand the difference between appearance frequency and relative frequency of the two data sets and where they are alike and where they are not.
Then a comparison is done between the proposed algorithm and different algorithms.The comparison is based on loss function value (29) and homogeneity test introduced in [28], this test is chosen because it gives a non-parametric estimator of a given divergence for the case of a continuous distribution, the chosen function for divergence estimation is the asymmetric Kullback-Leibler.The same faults shown in section VI are used for the comparison between different algorithms and to evaluate the overall performance.
Based on results shown in Table 7, the method presented in [20] has a relatively very small number of retained observations compared to the original data set, meaning it could be challenging to detect true differences in homogeneity if they exist.So, the homogeneity statistical test is ignored.The proposed algorithm has significantly reduced the number of retained observations compared to some algorithms presented in the same table.It has reduced the number of samples to one-fifth of the total observations.For Hotelling T 2 -index, the proposed algorithm has decent monitoring performances compared to the other algorithms.Furthermore, the ones that do better than the proposed algorithm are the conventional KPCA and the variogram-based RKPCA which have more than the triple of samples in the training data set.For the Squared Prediction Error Q-index, the proposed algorithm has the second-best monitoring performance after the k-means-based RKPCA which requires a pre-defined number of clusters to have such performance.Whereas, the proposed algorithm has the best monitoring performance through combined index ϕ along with the variogram-based RKPCA which again has a larger number of samples.According to the overall monitoring performance given by J , the proposed algorithm has the second-best monitoring performance after the variogram-based RKPCA.Furthermore, it can be clearly seen that the proposed algorithm has the best value J O which means that the proposed algorithm has balanced between the monitoring performance and retained samples.To visualize how the proposed algorithm performs in the case of a real process fault explained in Section VI, Figure 6 is introduced.It can be noticed that overall, the proposed algorithm has successfully detected the fault through the different monitoring indices despite a slightly high FAR value brought by the index Q compared to other indices.In the case of homogeneity between original and reduced data sets, the proposed algorithm has kept more homogeneous variables with the original data set than other algorithms such that only one variable is not homogeneous whereas other algorithms have at least two non-homogeneous variables which give the credibility of this reduced data to replace the original data set and to represent the same system without losing its relevant features.This homogeneity leads the proposed algorithm to perform much well in comparison with other RKPCA algorithms.
From Table 7 and 8, it is noticed that the proposed algorithm has successfully balanced the overall performance, the gained computation time with respect to other algorithms, the gained storage space of 95.40%, and homogeneity to the original data set.

VIII. CONCLUSION
This paper introduces a new RKPCA algorithm for fault detection.As the conventional KPCA, the proposed algorithm has conserved the ability to work with non-linear data sets while reducing the computation time and storage space needed to execute this algorithm.Not only it does enhance the execution time and occupied storage space, but it maintains the original data set's characteristics by preserving the homogeneity of this data set.The basic idea of the proposed algorithm is to select a number of representative observations in each class interval of the first principal score histogram from the original data set; in other words, it selects those observations based on the distribution of the original data set.By using this, a reliable KPCA model is attained.This model is then compared to the conventional PCA and KPCA algorithms through the three tank system data set.After that, the proposed algorithm is applied to the data set collected from the Ain El Kebira cement plant rotary kiln and compared to other existing algorithms.The overall performance of the proposed algorithm in terms of monitoring, computation time, and storage space was outstanding compared to the other techniques.Furthermore, it has successfully maintained homogeneity with the original data set.
Future research can be extended to many pertinent similarity techniques that could be developed and adapted to the size reduction purposes whose main objective is to obtain better performance in terms of monitoring metrics and computational costs.Therefore, this topic remains more attractive to the big data research community

Algorithm 1
PCA model.-Scale X n×m o with zero mean and unit variance -Compute covariance matrix, , using equation (1) -Compute eigenvalues and eigenvectors of the covariance matrix.-Select the Principal component using the CPV method as in equation (3).-Compute principal scores matrix T .Algorithm 2 KPCA for Fault Detection Training Part: -Scale X n×m o with zero mean and unit variance.

Algorithm 3
Proposed RKPCA Algorithm -Perform Algorithm 1. -Select 1 st Principal score vector -Plot histogram of this vector (with specified number of bins N B ). -Pick the minimum appearance frequency ε.i ≤ N B -Compute median of the i th bin.-Compute γ i as given in equation (26).-Select the number of observations equal to γ with the same median.-Select the corresponding rows in the scores matrix T .-Perform the inverse mapping of PCA.

FIGURE 2 .
FIGURE 2. Appearence frequency of the 1 st PC score of the three tank system original data.

FIGURE 3 .TABLE 2 .
FIGURE 3. Appearence frequency of the 1 st PC score of the three tank system reduced data.

TABLE 4 .
Gained execution time (%) for one monitoring via different fault indices and gained storage space (%) for the three tank system.

FIGURE 4 .
FIGURE 4. Appearance frequency of the 1 st PC score of the cement plant original data.

FIGURE 5 .TABLE 6 .
FIGURE 5. Appearance frequency of the 1 st PC score of the cement plant reduced data.

FIGURE 6 .
FIGURE 6. Different fault detection indices before and after the occurrence of a real process fault.

TABLE 1 .
Data set size reduction using the proposed algorithm for the three tank system.

TABLE 5 .
Reduced data matrix selection of the cement plant.

TABLE 7 .
Performance metrics comparison of different algorithms.

TABLE 8 .
Comparison of gained execution time (%) for one sample monitoring using different fault indices and gained storage space (%) via cement plant data of different algorithms.