Fault Detection for Aircraft Turbofan Engine Using a Modified Moving Window KPCA

As a typical data-driven fault detection approach, the moving window kernel principal component analysis (MWKPCA) method has attracted attention for fault detection of turbofan engines considering the presence of component degradation, but the conventional MWKPCA method uses a fixed step to update the KPCA model periodically until anomaly data is detected, this will increase the amount of calculation. To address this weakness, a modified MWKPCA method is proposed based on an adaptive updating mechanism for the KPCA model in this study. To realize the capability of updating KPCA model adaptively, k-means clustering method is utilized to divide a certain amount of newly acquired sampling data and the same amount of oldest data in the current time window into two categories, and calculates the Mahalanobis distance of the two clustering centers. Then the distance is compared with a prescribed threshold to determine whether to update the KPCA model. The proposed method is applied to an illustrative case, and the fault detection results under normal condition and sensor faults condition show that compared with the conventional MWKPCA algorithm, the modified MWKPCA method does not weaken the ability of fault detection and has better performances in terms of computation efficiency than the conventional MWKPCA algorithm with a fixed moving step.


I. INTRODUCTION
The aircraft turbofan engine, an extremely complex aerothermodynamics system operating under harsh environments, various faults will be inevitable happened in a long-time service process [1], [2]. These faults may lead to poor aircraft engine performance or even affect the flight safety. Online and real-time engine fault detection can help improve the safety and reliability by providing accurately and effectively anomaly monitoring.
In recent years, aircraft turbofan engine fault detection has been widely studied, and the main technologies can be categorized into two distinct groups [3], [4], including the model-based approaches [5]- [7] and data-driven approaches [8]- [11]. Model-based approaches provide most solutions to real-time monitoring and control problems [12]. But these approaches require a reasonably high-fidelity mathematical model of turbofan engine, which is unfortunately rarely available. This fact has propelled extensive research activities into data-driven-based fault detection. Many data-driven methods The associate editor coordinating the review of this manuscript and approving it for publication was Noor Zaman . are based on statistical and feature extraction methods [13], neural networks and machine learning [14]- [16], and fuzzy logic [17]- [19]. In these data-driven solutions, multivariate statistical process monitoring (MSPM) has been successfully applied to online process quality monitoring. And with the continuous improvement of aircraft engine data acquisition technology, the dimension and quantity of data acquisition are greatly increased. MSPM can handle high-dimensional, noisy, and highly correlated data and realize data dimensionality reduction and abnormal data detection. MSPM methods such as principal component analysis (PCA) [20]- [22], modified principal component analysis (MPCA) [23], partial least squares (PLS) [24], and independent component analysis (ICA) [25], [26] have been developed and applied for fault detection. Among MSPM methods, PCA is the most widely used technique, which relates to its conceptual simplicity. The commonly used fault detection indices in PCA method are the Hotelling T 2 and the Squared Prediction Error (SPE) [27]. However, PCA is a method of linear projection and performs poorly in dealing with nonlinear processes, like aircraft turbofan engine fault detection. To overcome this limitation, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the kernel principal component analysis (KPCA) [28] have been developed. KPCA first projects the input data space onto a feature space by means of nonlinear mapping and then computes the principal components via eigen decomposing the kernel matrix in the feature space. The two main advantage of KPCA are that: (1) It can efficiently extract the principal features using integral operators and nonlinear kernel functions and without involving any complex nonlinear optimization. (2) It can handle a wide range of nonlinearities by incorporating different kernel functions. KPCA has been successfully applied for nonlinear system fault detection. Similar as PCA, the Hotelling T 2 and SPE are also used in fault detection process based on KPCA method.
There are two main challenges of KPCA-based fault detection, firstly, as the training data size increases, the calculation of feature decomposition of the symmetric kernel matrix will be time consuming [29]. Secondly, the KPCA model remains time invariant once it is established based on the training data [28], [29], but most real industrial processes are time varying, and a fixed KPCA model cannot reflect the latest measuring information of system parameters, and false alarms may occur in fault detection process. To address the first challenge, Fazai et al. [29] and Jaffel et al. [30] proposed a reduced kernel principal component analysis (RKPCA) method, and k-means clustering was applied to reduce the number of training samples. To overcome the second limitation, several extensions for KPCA have been proposed as moving window kernel principal component analysis (MWKPCA) [30], [31], variable moving window kernel principal component analysis (VMWKPCA) [32] and adaptive kernel principal component analysis (AKPCA) [33]. The MWKPCA method adds the latest normal sampling data to the current time window and excludes the oldest observation from the current time window. For most industrial processes monitoring, like aircraft turbofan engine, in most cases, they are operated in the stable state. During this period, the measured output parameters contain a lot of redundant information, so it is necessary to study whether the KPCA model needs to be updated periodically. However, the conventional MWKPCA method lacks an effective mechanism for KPCA model updating and usually uses a fixed step to continuously update the KPCA model until anomaly data is detected. Therefore, fault detection based on the conventional MWKPCA method will increase the amount of calculation.
This article combines k-means clustering with conventional MWKPCA, and proposes a modified MWKPCA used for aircraft engine fault detection. K-means clustering method divides a certain amount of newly acquired sampling data and the same amount of oldest data in the time window into two categories, and calculates the Mahalanobis distance of the two clustering centers. Then the distance is compared with a prescribed threshold. If the Mahalanobis distance of the two clustering centers does not exceed the threshold, it means that the newly acquired data does not bring more new information, so it is not necessary to update the KPCA model, otherwise the KPCA model need to be updated. The modified MWKPCA method, on the basis of not affecting the original MWKPCA method fault detection performance, will greatly reduce the calculation amount of the algorithm.
The remainder of this article is organized as follows: In Section. II, the fault detection method based on KPCA and conventional MWKPCA method are introduced. The modified MWKPCA method and the key technologies are introduced in Section. III. In Section. IV, aiming at the normal operating condition and sensor faults condition of an aircraft engine, the conventional MWKPCA and the modified MWKPCA algorithm are simulated and verified, and the fault detection performance and calculation amount of the two algorithms are compared. The conclusion is summarized in Section. V.

II. PREVIOUS WORKS
Before introducing the modified MWKPCA algorithm, we first review the fault detection methods based on KPCA and MWKPCA, respectively.

A. FAULT DETECTION BASED ON KPCA
KPCA maps sample data (X ∈ R m×n ,where m denotes the number of samples and n denotes the dimensions of the original space) from its original space into a higher dimensional feature space ( ∈ R h×h , h n) using a nonlinear mapping function and the principal components are obtained in .
X ∈ R m×n is extended into by using nonlinear function φ(x i ) (i = 1, . . . , m). The covariance matrix C ∈ R h×h in is defined as It is assumed that φ(x i ) ∈ R h is the sample in with zero-mean and unit-variance. The eigenvalue decomposition of C can be solved by where v j ∈ R h×1 (j = 1, . . . , h) and λ j ∈ R 1×1 are the jth eigenvector and eigenvalue, respectively. In space , v j can be linearly expanded by coefficients α Combining equation (2) and (3), we can obtain To avoid obtaining φ(x) and doing eigenvalue decomposing of C directly, the kernel matrix K ∈ R m×m [28] is given by Combining equation (5) and (6) into (4), we can get Kernel function should satisfy Mercers theorem [33], and polynomial, sigmoid, and radial basis kernels are the widely used kernel functions. Before doing eigenvalue decomposition in equation (7), the kernel matrix K should be conventionalized and centralized toK: where 1 m is a square matrix of order m, the elements are all 1/m . The equation (7) can be expressed as Solve the equation (9), then the orthonormal eigenvectors (α 1 , α 2 , . . . , α m ) and the corresponding eigenvalues (λ 1 ≥ λ 2 ≥ . . . ≥ λ m ) can be obtained. The number of principal components d can be chosen by Cumulative Percent Variance (CPV) [34] and it is given by th CPV is a user-defined threshold. For all p = 1, . . . , d, the pth score values t p (x) of the new measurement x ∈ R 1×n can be obtained by projecting φ(x) onto the direction of ν p in space , as shown in the following formula α i p denotes the ith element of α p . The two statistics Hotelling T 2 and SPE are usually used to detect faults in nonlinear system, and they are defined as [35]  where = diag(λ 1 , . . . , λ d ), T 2 β and SPE β denote the confidence limit of statistics Hotelling T 2 and SPE with the confidence level β, respectively. T 2 β and SPE β can be calculated as [28], [36] where F d,m−d,β denotes the critical value of F−distribution with (d, m − d) degrees of freedom and the confidence level β. The a and b are the estimated mean and variance of the SPE determined from training data. χ 2 β (2a 2 /b) means the χ 2 −distribution with 2a 2 /b degrees of freedom and the confidence level β.
The fault detection process based on KPCA are divided into two phases: building a KPCA model under the normal operating condition by off-line training and online real-time monitoring. The details of the phases are described in Table. 1.

B. FAULT DETECTION BASED ON CONVENTIONAL MWKPCA
The conventional MWKPCA algorithm is applied by moving a fixed length data window in real time to update the KPCA model when a certain number of new normal samples are obtained. And the new normal samples are added to the current data window and to keep the length of the data window constant, the same number of oldest samples are removed. We assumed L denotes the window length, and one move step is h.
The key to updating the KPCA model is to update the centralized kernel matrixK. The update process includes two stages: after adding h data samples can be expressed as shown in equation (14), where the upper left corner is the kernel matrix in the original modelK L ∈ R L×L : where wherek 11 =k(x 1 , x 1 ), the other elements in equation (15) can be deduced like that. After removing the first h rows and the first h columns in equation (14), and the updated kernel VOLUME 8, 2020 matrixK L can be obtained as equation (16): After updating the kernel matrix, calculate the principal eigenvalues and eigenvectors ofK L . The control limits of T 2 lim and SPE lim also need to recalculate according to the new data window {x h+1 , x h+2 , . . . , x L+h }. And the next new continuous data samples are monitored using the updated KPCA model. The implementation of this algorithm requires the establishment of an initial KPCA model by training normal data in offline. The KPCA offline training phase is shown in Table. 1. And the details of the online fault detection phase based on the conventional MWKPCA algorithm is described in Table. 2.

III. THE PROPOSED MODIFIED MWKPCA METHOD
The conventional MWKPCA method would update the KPCA model when the newly acquired h samples are normal. It is not necessary to update the KPCA model, if the h oldest samples to be discarded are relatively stable compared to the newly acquired h samples. Due to the lack of an adaptive KPCA model updating mechanism, the conventional MWKPCA method updates KPCA models too frequently. Therefore, a new KPCA model updating mechanism for MWKPCA algorithm needs to be considered.
Suppose the data set { x 1 , x 2 , · · · , x h , · · · , x L } denotes the current data window. If the KPCA model is updated, the data set in the next time window is { x h+1 , x h+2 , · · · , x L , · · · ,x L+h } . To decide whether to update the KPCA model established based on data set { x 1 , x 2 , · · · , x h , · · · , x L } , the similarity of the two data sets {x 1 , x 2 , · · · , x h } and { x L , x L+1 , · · · , x L+h } needs to be compared. Given the number of clusters, k-means clustering method can maximize the distance between clusters and minimize the distance between internal data points. Therefore, we choose k-means clustering method to measure the similarity between the two data sets. Divide the two data sets into two categories using k-means clustering method, and obtain the two cluster center points (x c1 , x c2 ). Because the Mahalanobis distance [35] between two points is independent of the measurement unit of the original data, we calculate the Mahalanobis distance of the two cluster center points to measure the similarity between the two categories, as shown in equation (17).
In equation (17), ω −1 represents the inverse matrix of the diagonal matrix formed by the conventional deviation of the measurement parameters. d threshold is a presetting threshold for Mahalanobis distance between the two cluster center points x c1 and x c2 . If d Ma (x c1 , x c2 ) does not exceeds the threshold d threshold , which means data set { x L , x L+1 , · · · , x L+h } is similar to data set {x 1 , x 2 , · · · , x h }, therefore the current data window and the corresponding KPCA model do not need to be updated. There are two problems need to be solved before implementing the new KPCA model updating mechanism to MWKPCA algorithm: firstly, how to perform effective recursive calculation on (x c1 , x c2 ), secondly, how to preset the threshold d threshold .The following two parts will introduce the solutions to these two problems in detail.

A. K-MEANS CLUSTERING METHOD
K-means is a typical clustering algorithm, which is simple and possesses rapid convergence rate. K-means method partitions the input dataset into k clusters and find out k clustering centers to minimize the distance between each sample point and its nearest clustering center [37]. The general calculation steps of k-means clustering method are as follows: (1) Determining K initial cluster centers randomly. (2) For each data sample to be classified, find the nearest cluster center, and divide it into the data set. (3) Recalculate the clustering center. (4) Calculate the sum of the minimum squared distance between each data point and its nearest cluster center D c . (5) If the D c converges, terminate the clustering process, otherwise return step (2).
Assume that the data set in the current data window is { x i+1 , x i+2 , · · · , x i+L } , after obtaining h normal samples, k-means clustering is used to divide dataset { x i+1 , x i+2 , · · · , x i+h } and dataset { x i+L+1 , x i+L+2 , · · · , x i+L+h } into two categories, and the distance between two clustering centers is calculated. To reduce the amount of calculation in the calculation process of k-means clustering centers, we consider a recursive method as shown in Fig. 1. Firstly, two clustering centers of data set { x i+1 , x i+2 , · · · , x i+h } are calculated using the k-means clustering method as the initial values of the recursive calculation. Then the Mahalanobis distances between the next sample data x j and each clustering center are calculated using the equation (17). Divide x j into the data category with the smaller Mahalanobis distance, and the clustering center of the data category is updated using the equation (18): In equation (18), x ci (i = 1, 2) denotes the updated clustering center, n denotes the number of samples in this data category.
After completing the classification of the h data samples and the recursive calculation of the cluster points, then it is determined whether the Mahalanobis distance between the two clustering centers calculated using equation (17) exceeds the presetting threshold d threshold . If the Mahalanobis distance exceeds d threshold , the KPCA model and clustering data set { x i , x i+1 , · · · , x i+h } need to be updated. And the two k-means cluster centers of the updated data set need to be calculated as the new initial values of the iterative calculation process. Otherwise, if the Mahalanobis distance does not exceed d threshold , then continue to process new data samples based on the current cluster centers of the data set { x i , x i+1 , · · · , x i+h } .

B. DETERMINE THE PRESETTING THRESHOLD
The threshold d threshold is used to determine whether to update the time window data and the corresponding KPCA model. The setting of d threshold should comprehensively consider the impact of engine component performance degradation on sensor measurement parameters and the fault detection performance of the improved MWKPCA algorithm, such as false alarm rate, fault detection rate, fault detection time and other performance indicators. It is assumed that the same baseline model can be used for fault detection if the relative change of the performance of engine components is less than 0.5% . The idea of obtaining the appropriate setting of d threshold is: referring to the degradation law of the turbofan engine, taking 0.5% as performance degradation change step of engine components, calculate the Mahalanobis distance of the output parameters in each adjacent degradation situation and a data set of the Mahalanobis distance is formed, and the minimum value of the Mahalanobis distance is chosen as the d threshold .
In the actual operation of aircraft engines, the performance of their mechanical rotating components will degrade slowly, in order to simulate the normal degradation process, the performance degradation of HPT component is considered. The HPT component operates in a high-temperature and high-pressure environment. Compared with other mechanical rotating components, the performance degradation rate of HPT components is faster. The HPT efficiency coefficient e HPT and flow coefficient f HPT are used to measure the health condition of HPT component. With reference to relevant literature [38], the relationship between the performance degradation level F of HPT and the changes of e HPT and f HPT is as follows: 1 + ( f HPT / e HPT ) 2 (19) f HPT = e HPT * ( f HPT / e HPT ) (20) where f HPT / e HPT ∈ [−1, −0.5], the relationship between the Mahalanobis distance of the output parameters in each adjacent degradation situation and the performance degradation level F of high-pressure turbine components is simulated as shown in Fig. 2.
Combining the simulation results and the d threshold setting idea, the minimum value of the data set {d i Ma } is set as the VOLUME 8, 2020 d threshold , that is:

C. OVERALL STRUCTURE AND ALGORITHM IMPLEMENTATION OF THE MODIFIED MWKPCA ALGORITHM
The flowchart of the modified MWKPCA algorithm is shown in Fig. 3. Firstly, in the offline training phase, the initial KPCA model is trained using the normal samples and the corresponding control limit T 2 lim and SPE lim are calculated. Like all other data-driven fault detection approaches, a normal data set is required for the initial training stage of the proposed method. In the real application of the algorithm, the initial training data can be extracted from the aircraft engine ground test data sets. Then in the online real-time fault detection phase, the real time test data is monitored by the trained KPCA model and the corresponding statistics T 2 and SPE of the test data are calculated and compare with the control limits. If one of the statistics exceeds the corresponding control limit, the test data is considered as abnormal data, and the parameter for counting i needs to recount. If i accumulate to h, whether to update the KPCA model is determined by applying k-means clustering method, which help to calculate the Mahalanobis distance between the two cluster centers x c1 and x c2 . If the Mahalanobis distance does not exceed the threshold limit, continue to use the current KPCA model for online fault detection, otherwise, the current KPCA model and the corresponding control limits need to be updated using the latest obtained normal data.
The specific algorithm implementation steps of the modified MWKPCA fault detection method are shown in Table. 3.

IV. COMPARATIVE STUDY
In this section, the conventional MWKPCA algorithm and the modified MWKPCA algorithm are applied to the fault detection of a nonlinear model of a commercial aircraft turbofan engine. The fault detection performances (false alarm

A. THE DESCRIPTION OF THE AIRCRAFT TURBOFAN ENGINE NONLINEAR MODEL
The architecture of a civil aircraft turbofan engine is shown in Fig. 4. The main components consist of the fan, low pressure compressor (LPC), high pressure turbine (HPC), combustor, high pressure turbine (HPT), low pressure turbine (LPT), and two rotating spools. The station definitions are shown along the bottom of the Fig. 4. The aircraft turbofan engine model is developed in [39]. The sampling period of the engine model is 0.02s. The control inputs to this engine model are the fuel flow (WFM ), variable bleed valve position (VBV) and variable stator vane angle (VSV). The controller of this engine model is designed in [40]. The control law of the engine under steady-state condition is to control the fuel flow to keep the fan speed constant by giving the desired speed  of the fan. Two spool rotation and five gas path sensors are chosen as measurement parameters of the engine, which are used in most commercial turbofan engines [41]. The Gaussian white noise is added to the output of the simulation results of the seven parameters, which ensuring that the simulation approximates the actual engine conditions as closely as possible. The description of the seven parameters and their conventional deviations are listed in Table. 4, the installation position of the sensors can be seen from the indices at bottom of the Fig. 4.

B. SIMULATION RESULTS UNDER NORMAL OPERATING CONDITION
In this part, the cruise operating condition is chosen for this simulation. In this simulation process, the changes of the performance degradation level F of HPT, the fuel flow WFM and the seven sensor measurement parameters are shown in Fig. 5. Under the action of the engine controller, the fan speed remains basically unchanged in this simulation process. The simulation ran for 500s then 25000 samples were collected.
A radial basis kernel function is selected as the kernel function, as shown in equation (22).
where γ = amσ 2 , and m, σ 2 denote the input space and the variance of the data, respectively. a is a constant, which can be determined by consideration of the process to be monitored [28]. The th CPV in equation (10) is set to 0.95, and the confidence limit of the T 2 and SPE index is set to 0.99. The length of the data window and the moving forward step h are set to 200 and 5, respectively. According to the above parameter settings, the conventional MWKPCA algorithm can be implemented. And the d threshold can be determined according to equation (21), then the modified MWKPCA algorithm can be implemented in the same fault detection process as the conventional MWKPCA algorithm. It is worth mentioning that the parameter settings of the proposed method, such as the length of the time window, the fixed moving steps and the kernel function selection can be optimized based on the fault detection performance. Since the purpose of this article has been limited to reducing the computation cost of the conventional MWKPCA method, considering the parameters optimization is beyond the scope of the present study, the parameter settings are kept consistent before and after the algorithm modification. The fault detection results of the conventional MWKPCA and the modified MWKPCA using T 2 statistic and SPE statistic in the case of normal operating condition are presented respectively in Fig. 6 and Fig. 7. The performance of the two algorithms is presented in Table. 5. We notice  that both of the two algorithms meet the requirement of the false alarm rate limit (T FAR ). However, the conventional MWKPCA algorithm updates the KPCA model in fixed steps and the number of the total update times is 4492, the modified MWKPCA algorithm updates the KPCA model adaptively in the fault detection process and the number of the total update times is 76. The eigenvalues and eigenvectors of a kernel matrix of size (L, L) are needed to be calculated when the KPCA model is updating. And the updating process consumes O(L 3 ), which is far great than the calculation cost of the recursive calculation process of cluster centers. Therefore, by comparing the update times of KPCA model in the fault detection process, the modified MWKPCA algorithm performs better than the conventional MWKPCA algorithm in terms of calculation amount.

C. SIMULATION RESULTS UNDER SENSOR FAULTS CONDITION
The fault detection performance of the two algorithms under sensor faults condition is compared in this part. The setting of the flight condition and the component degradation level of HPT are the same as the normal operating condition.   And it is assumed that three sensor faults have been occurred in three different time intervals for the measurements N h , Ps 3 and T 45 . These faults are presented in Table. 6. The change of the performance degradation level F of HPT, the fuel flow WFM and the seven sensor measurement parameters in this simulation process are shown in Fig. 8.
The setting of the kernel function and the related parameters in this fault detection process is as same as the simulation under the normal operation, which is shown in the former part. The monitoring results of the conventional MWKPCA and the modified MWKPCA using T 2 and SPE are presented respectively in Fig. 9 and Fig. 10. From these figures, all the injected faults are detected at time.
The fault detection results of the two algorithms are shown in Table. 7. In Table. 7 the false alarm rate, fault detection time, and the update times of KPCA models are summarized. We notice that the modified MWKPCA algorithm provides a comparable performance in term of false alarm rate and the fault detection time than the conventional MWKPCA  algorithm. However, the two algorithms have significant differences in the update times of KPCA model in the process of fault detection, which means that in the case of sensor fault, the modified MWKPCA can reduce the calculation amount of the algorithm and improve the execution efficiency of the algorithm.

V. CONCLUSION
In this article, a novel fault detection based on the modified MWKPCA is proposed for aircraft turbofan engine. In this method, a new KPCA updating mechanism for the conventional MWKPCA is established based on the k-means clustering and Mahalanobis distance of the clustering centers. The proposed method is verified under normal operating condition and sensor faults condition, and the normal degradation of the components of the aircraft turbofan engine is considered in both of the two simulation cases. Compared with the conventional MWKPCA algorithm, the improved MWKPCA algorithm can realize updating KPCA model adaptively according to the changes of the system output parameters. And the modified MWKPCA algorithm has less KPCA update operations in the fault detection process, therefore, it has better performances in terms of computation cost. A future research direction is to design a real time fault isolation and fault reconstruction strategy based on the proposed modified MWKPCA algorithm.