Towards Smart Condition Monitoring of Rotatory Machines: An Optimized Probabilistic Signal Reconstruction Methodology for Fault Prediction with Multi-source Uncertainties

Faults in the critical components of a large rotatory machine often result in unplanned breakdowns, leading to a significant loss of property and life. Condition monitoring, as a key component in the smart maintenance of industrial equipment, has become a promising tool for providing automatic early alerting of potential damage to critical components, thus reducing potential outages and improving system safety and reliability while lowering maintenance costs. This is still a very challenging topic in various industrial fields because of data imperfection and multivariate correlation, as well as the variation in faults and components in different machines. This paper presents an optimized probabilistic signal reconstruction methodology to address these challenges in the fault prediction of rotatory machines using multivariate vibration signals. Three signal reconstruction methods, that is, Bayesian wavelet multiscale decomposition, probabilistic principal component analysis, and auto-associative kernel regression, were seamlessly integrated to address the noise, high dimensionality, and correlation in the sensed multivariate vibration data for accurate fault prediction. The bandwidth parameter in the auto-associative kernel regression approach was optimized to represent the health status of the rotatory machine. The obtained model was further utilized to predict the responses under unknown conditions. The alerting threshold, based on the squared mean errors of the predicted and measured time series, was automatically adjusted using a rolling window strategy and then employed to predict the possible fault. Finally, the validity, feasibility and generalization of the proposed methodology are illustrated by applying two cases of different rotating machines: centrifugal compressor and gas turbine.

full advantage of emerging digital technologies such as the Internet of Things (IoT), big data, artificial intelligence (AI), and cloud computing to capture equipment behaviors from the numerous operation data collected from a rotating machine. It has been broadly applied to various complex equipment such as gas turbines and steam turbines. In the past decades, a wide spectrum of data-driven predictive analytics methods, such as time series forecasting, machine learning, and artificial neural network models, have been developed to predict faults in machine condition monitoring [3]- [5]. These methods are generally composed of two main steps:1) establishment of a high-fidelity predictive model to produce the system response, and 2) determination of a decision threshold to produce alarms when the system response deviates from the actual vibration measurement to a certain degree. In addition to the existing uncertainties in sensor data, both model establishment and threshold determination contain uncertainties, which would further impact the fault prediction accuracy of a rotatory machine. Therefore, it is of paramount importance to accurately predict faults for machine condition monitoring, considering various uncertainties. Recently, the auto-associative kernel regression (AAKR) method was developed as a similarity-based model (SBM) for condition monitoring and fault alerting in large-scale industrial equipment [6]- [13]. This approach uses multivariate historical data collected under normal conditions to establish a system identification model that represents the health status of a physical system. The obtained model was then used to predict the multivariate responses of the system under unknown conditions. A health index calculated from the difference between the predicted responses and actual measurements was then used to quantitatively assess the status of the system. This approach has demonstrated high accuracy, flexibility, and generalization with broad applications in various types of industrial equipment [6]- [13]. It can be effectively applied not only for non-rotating equipment, such as boilers in a coal plant [9] and gear boxes in a wind turbine [10], but also for rotating machines, such as bearings in engines [11] and blades in compressors or steam turbines [12]. Unlike other parametric regression models in machine learning or nonparametric AI methods, the AAKR approach does not require any prior knowledge of the system or components under investigation. It provides a generalized flexible nonparametric system identification approach for fault prediction in different multivariate scenarios of a complicated physical system. However, this approach has three main limitations. First, it is highly sensitive to the uncertainties existing in sensor data, such that it may produce a number of false alarms and even missing events. Second, the bandwidth parameter in the kernel function of the AAKR must be tuned from historical sensor data. Improper selection of the bandwidth value may result in inaccurate fault predictions. Finally, subjectivity and variation in the general threshold determination play a key role in the fault prediction accuracy of large-scale rotatory machines. Recently, some researchers have developed a hybrid model to partially address the abovementioned issues by combining signal-processing techniques with the AAKR method [7] [9][11] [13]. For instance, Maio et al. [7] combined a correlation analysis and a generic algorithm to classify the signals and then applied AAKR for fault prediction using the classified signals. Yu et al. [9] averaged the prediction results obtained from multiple AAKR models to alleviate the effect of outcome variations on the prediction accuracy. Brandsaeter et al. [11] combined the k-means clustering and AAKR estimation approaches to improve the computational efficiency of the method. Baraldi et al. [13] revised the weights of the AAKR model to reduce the correlations between multiple variables. However, none of the existing methods have comprehensively addressed their sensitivity to uncertainties in data, modeling, and thresholding. It should be noted that the determination of the alarm threshold also plays a critical role in the fault prediction accuracy of the AAKR approach. Several techniques such as the sequential probability ratio test (SPRT) [7] [12], likelihood ratio test [8], and empirical estimation [8] are commonly used to determine the alarm threshold level. These methods use the statistical characteristics of a unit during normal operation as health indicators to monitor the unit operation status in real time. In this study, a discrete wavelet packet transform is first employed to decompose a set of raw vibration signals into multiresolution approximations and details through wavelet functions in the time-frequency domain. Bayesian hypothesis testing was then applied to determine whether noise existed in each decomposed coefficient series. The adept combination of multiresolution wavelet analysis and probabilistic Bayesian assessment [14] can trade-off the under-denoising and overdenoising of raw signals, outperforming the traditional soft or hard thresholding approach in the wavelet fashion. Furthermore, this paper presents an enhanced optimized AAKR (OAKR) method that integrates advanced signal processing, probabilistic principal component analysis, kernel parameter optimization, and adaptive thresholding to address the aforementioned issues of the traditional method for machine fault prediction using multivariate vibration signals for the smart maintenance of rotatory machines. In the following sections, three signal reconstruction methods-Bayesian wavelet denoising, probabilistic principal component analysis, and auto-associative kernel regression (AAKR)-are introduced in the methodology for data preprocessing and modeling. The first two preprocessing methods were employed to remove potential noise from the raw vibration data and reduce the dimensions of the multivariate signals, thus improving fault prediction and diagnosis accuracy. The simplex method is employed to obtain the optimal bandwidth of the kernel function in conventional AAKR, resulting in an optimized similaritybased estimation model. A generalized implementation procedure was developed to automate the application of the proposed methodology for machine fault prediction and diagnosis. Vibration data and events collected from two real-world rotating machines, one centrifugal compressor, and one heavy-duty gas turbine, were employed to demonstrate the effectiveness, feasibility, and generalization of the proposed methodology and procedure. Finally, conclusions are presented.

A. Bayesian wavelet signal reconstruction
Wavelet multiresolution analysis works like a mathematical microscope that is broadly used to exploit details in time-series signals [15] [16]. Unlike the traditional Fourier transform, where only frequency features are extracted from a signal, wavelet analysis decomposes the raw signal into different levels of coefficients in both the time and frequency domains through a series of wavelet functions. In particular, the discrete wavelet packet transform (DWPT) simultaneously splits the obtained approximation and detail coefficients from one level to the next, thus decomposing the raw signals into multiresolution levels of detail for subsequent analysis. Given a time-series vibration signal with N points for a sensor variable in a rotatory machine, x ! (i = 1,2, … , N), the DWPT approach decomposes the signal into a set of scaling coefficients " ( ) and wavelet coefficients w # (k) simultaneously. Note that parameter i represents the time point in this formulation. The time series can be reconstructed by summing the discrete decomposition coefficients, as follows: where j is the number of decomposition levels, k is the coefficient point, and ",' ( ) and ",' ( ) are the k-th scaling and wavelet functions, respectively, at time point of the j-th level. Jiang et al. [14] provided details on the DWPT decomposition and Bayesian thresholding for each decomposed coefficient. The denoising performance was evaluated quantitatively using the signal-to-noise ratio (SNR) and root mean squared error (RMSE) metrics. The SNR metric is calculated by the ratio of the summation of the squared signal to that of the squared noise and then converted into decibels (dB) by taking the logarithm on the basis of 10, given by A smaller SNR value indicates more noise in raw signals. The RMSE metric indicates the difference between the raw signal and denoised signal, expressed as The larger RMSE value indicates the more noise in the raw signals.

B. Probabilistic Principal Component Analysis
Principal component analysis (PCA) is a commonly used nonparametric statistical approach for multivariate data analysis based on eigenvalue decomposition. The yielding eigenvalues and eigenvectors represent the variations in the principal components and weights of the raw variables, respectively. The PCA approach converts high-dimensional correlated multivariate data into low-dimension uncorrelated principal components, thus lowering the dimensionality and extracting the main features from multivariate data. The traditional PCA is a deterministic multivariate analysis approach that cannot address data uncertainties. Tipping and Bishop [17] proposed a probabilistic principal component analysis (PPCA) approach to handle uncertainties in multivariate data. Unlike factor analysis, which assumes varying noise variances, PPCA assumes the same noise variance on the raw signal, which largely simplifies the parameter estimation and signal reconstruction. This study developed a PPCA approach to further alleviate the impact of data uncertainties on the fault prediction accuracy of rotatory machines. Given a d-dimensional multivariate observation matrix = {C ) , C 3 , … , C 4 } where each variable has N data points, that is, , where C 5 denotes the cleansed time series obtained by the Bayesian wavelet thresholding method represented by Eq. (1). Let = { C ),$ , C 3,$ , … , C 4,$ } be the i-th data point for the d-dimension variables, which can be expressed as where $ are the latent random variables representing q(q£d) principal components, which are assumed to be a multivariate Gaussian distribution, i.e., $~( , ) , where the bold number 0 represents a d-dimension vector of zeros, I is the unit matrix, W is the × matrix representing the relationship between $ and $ , is the mean vector of the d-dimension multivariate data, i.e., = , and ℇ $ is the white noise item in the data $ , i.e., ℇ $~( , 3 ), where 3 is the variation of the noise calculated from the raw data. Equation (4) can be expressed in a distribution form as follows, Thus, the conditional distribution of $ on the latent variable $ is obtained by According to the conditional probability formulation, the conditional distribution of $ for data $ can be derived as follows: where = + 3 . As $ is a vector of p-dimensional random variables, its expectation conditional on data $ , 0 ( $ − ), is the desirable PPCA result, denoted by , that is, Note that in (7), the parameters W and 3 must be estimated. This study employed the expectation-maximum optimization approach to estimate the PPCA parameters. The likelihood function ℒ can be expressed as is the variance of the multivariate matrix X and tr is the trace of the matrix. The optimal parameters W and 3 were obtained by maximizing the likelihood function ℒ in (9). Refer to [17] for further details on the parameter estimation of the PPCA approach. Similar to the classic PCA approach, the loadings of the PPCA, denoted by L, are employed to identify the key factors contributing to a principal component, which are defined as the correlations between the original signal and principal components given by The loadings L are computed as the cross-variance matrix between the original signal and the principal components (PCs), that is, the product of the PPCA transfer coefficients W and the variance of the original signal divided by the square root of N-1.

C. Optimized AAKR Approach
The auto-associative kernel regression (AAKR) approach is a similarity-based nonparametric modeling technique that estimates the system response based on the similarity between the measurement and memory vectors. Compared to other nonparametric approaches, such as neural networks, this method can be easily trained with historical measurements, independent of fault modes and equipment types. As a result, AAKR provides a robust multivariate system identification approach for condition monitoring and fault prediction of industrial equipment. Instead of using a multivariate sensor time series, AAKR was trained using the principal components obtained from the PPCA. The memory vector for the model training is stored in matrix of (8), where Θ $," is the i-th vector of the j-th principal variable. For the F -th memory vector, matrix can be expressed as The 1 × monitoring response vector of the machine is expressed as The model prediction can be obtained by averaging each memory vector in the matrix , the weights of which are estimated using healthy data. This study proposes an optimized AAKR approach (OAKR) that consists of the following four steps: First, calculate the commonly used Eurlean distance between the monitoring vector and each memory vector $ to produce an F × 1 distance vector d, given by Second, we obtained the F × 1 weight vector w by fitting the Gaussian kernel function as follows: where h is the bandwidth of the kernel function that determines the smoothing level of the function. A smaller h can display more details in the data but fails to yield a smooth tail of the function, whereas a larger h can produce relatively smoother transactions but lose details in the system identification, resulting in more false alarms in the condition monitoring of rotatory machines. Therefore, the selection of bandwidth h plays a critical role in the fault prediction accuracy in the condition monitoring of industrial equipment. Third, the optimal bandwidth is obtained by automatically minimizing the mean squared error of the model prediction using a set of new data and the Nelder-Mead optimization algorithm, as described subsequently, thus reducing the false alarm rates of equipment condition monitoring. Finally, we estimate the response • using the weight matrix w and monitoring vector $ , as follows: It should be mentioned that this OAKR approach is related to the measured data only, and is independent of the fault and equipment under monitoring. Consequently, this approach provides a powerful tool for monitoring the conditions of industrial equipment.

D. Nelder-Mead Optimization
In this study, the AAKR method was improved by automatically selecting the optimal bandwidth to minimize the model prediction error. The Nelder-Mead (NM) simplex algorithm was proposed to achieve this purpose owing to its simplicity and effectiveness for online implementation [18] [19]. The NM algorithm is a heuristic optimization method proposed by Nelder and Mead (1965) [20] [21], which does not require any mathematical derivative of the objective function but is very effective in dealing with nonlinear and multidimensional problems [19]. The algorithm does not require mathematical derivatives, but a few function values at each iteration, which makes it computationally very efficient and broadly applicable for online implementation. In addition, the algorithm generally converges to a global optimal value within a few iterations. Therefore, it is especially popular for optimization problems, where the derivative of an objective function is hardly obtained explicitly.
According to Lagarias et al. (1998) [18], the NM algorithm requires predefined coefficients for four operations: reflection ( ), expansion ( ), contraction ( ), and shrinking ( ). In this study, the commonly used values = 1, =2, and = = 0.5 are taken, and five steps in one iteration are employed to optimize the bandwidth parameter h. The objective function is the mean square error (MSE) between the prediction output and measurement data, calculated as follows: 16) where N is the number of samples, M is the number of model variables, and F and Š F are the measurement data and model output of the m-th variable, respectively. For brevity, the details of the optimization method are omitted in this study. Refer to Tang et al. [22] for details regarding this algorithm applied to the OAKR method.

1) Mean Square Error (MSE)
The MSE value was employed in this study to quantitatively evaluate the model validity and system health status. In model training, the MSE value obtained using Eq. (16), approaching zero indicates a well-trained model: The trained model is used to predict the response of the system or component under unknown conditions. When the obtained MSE value exceeds a preset threshold, an alarm is generated to alert the large deviation between the measurement data and the model prediction. In this study, the alert threshold was determined using model predictions under healthy conditions.

2) R-square Metric
The R-square is defined as follows ) 3 , and is the mean value of the prediction data. The R-squared value was used to evaluate model-fitting performance. A value approaching one indicates a well-fitted model.

III. Implementation Procedure
The improved OAKR method combining Bayesian wavelet thresholding and probabilistic principal component analysis consists of 14 main steps, which is able to accurately predict the fault and identify the main variables contributing to the fault, as shown in Fig. 1 and briefly described as follows. 1) Sensor data collection. Multivariate sensor time series data collected from a rotatory machine are usually transmitted to local or cloud servers. High-frequency historical data of the vibration variables were extracted from the servers for offline modeling of the enhanced OAKR method for machine fault prediction. 2) Frequency feature extraction. The fast Fourier transform was conducted on the high-frequency vibration data to extract frequency features such as frequency, peak, angle, and magnitude values for subsequent data analysis and modeling.

FIGURE 1. Flowchart of implementing the proposed methodology for anomaly alarming and fault diagnosis of turbomachines
3) Data imputation. The sensor data collected from a rotatory machine usually contain missing values due to sensor faults, data transmission, data storage, etc.. In this study, missing or infinite values for the critical variables were imputed using the forward or backward of the most possible neighbor points. This approach did not change the pattern of the raw data. 4) Outliers removing. Outliers in feature data may exist because of sensor faults, machine shutdowns, or startups. In this study, the feature data points during machine shutdown were removed by evaluating a machine rotating speed that was lower than a small positive value. 5) Data normalization. The different magnitudes and dimensions of various frequency features affect the fault prediction accuracy in the condition monitoring of a rotatory machine. In this study, the multivariate datasets were normalized to those with a mean of zero and variance of one, thus eliminating the magnitude effect of different variables. 6) Bayesian wavelet packet thresholding. The Bayesian discrete wavelet packet transform detailed in Jiang et al. [14] and Tang et al. [22] was employed to threshold the noisy feature data in this study. 7) Denoising evaluation. The denoising effectiveness is evaluated visually by a graphical comparison between the raw and cleansed signals and spectra plots, and quantitatively by using the SNR and RMSE metrics in (2) and (3). The Welch method [23] was employed in this study as a periodogram spectrum analysis approach to graphically evaluate denoising performance in the frequency domain. 8) Dimension reduction. The previously described PPCA approach was employed to extract the main information from the multivariate correlated data while reducing the data dimension in a vertical fashion. In this study, the lower dimension of the variables was obtained by retaining the principal components with 95% variance in the raw data. The principal components were then obtained from the PPCA and used for subsequent OAKR modelling. It should be mentioned that the obtained PPCA coefficients in the model training were saved and used later for model prediction on the new dataset. 9) Similarity modeling. The principal components of the PCA were used to establish the AAKR model weights.
In this study, 85% of the principal component data under a healthy status were used to train the model weights, of which 70% of the data were used to form the memory matrix and obtain the AAKR model to represent the system in a healthy state. 10) Band-width parameter optimization. The remaining 15% of the principal components for modelling were used to obtain the optimal bandwidth of the kernel function using the Nelder-Mead simplex algorithm, as described by Tang et al. [22], thus yielding an OAKR model. 11) Model validation. The remaining 15% of the principal component data at a healthy status was used to test the trained model. Both the MSE and R-square metrics were used to evaluate the model performance. 12) Model prediction. Given a set of time-series vibration data collected from the machine under unknown conditions, the methods described in steps 2) to 7) are applied to clean the data. The same PPCA coefficients obtained from the training data were used to reduce the dimensions of the multivariate data. The yielded principal components were used as inputs to the trained OAKR model to produce the response of the system under healthy conditions. 13) Fault alarming. A rolling window strategy is employed to automatically judge whether the machine has a fault by evaluating the mean square of error (MSE) between the model output and actual principal components in comparison to the preset threshold. Based on the data resolution, the rolling window size may be taken to be one minute, one hour or one day. When the MSE value continues to change in five continuous windows and exceeds the preset threshold, the machine is judged to have an anomaly with a fault, thus yielding an alarm. In this study, the threshold of k√ obtained from the healthy data in model validation was used for fault detection. According to Baraldi et al. [24], k=4 was initiated in the examples presented in this study. Owing to the variation in different devices, the k value may or may not be adjusted in practical applications to obtain the lowest false positive rate. 14) Identification of influencing factors. The PPCA loadings obtained by (10) are further employed to identify the sensor variables that mainly influence faults, which will facilitate the root cause analysis of faults in the smart maintenance of turbomachines.

IV. Illustrative Examples
Sensor data and events collected from two different real-world turbomachines, a centrifugal compressor and gas turbine, were employed in this study to illustrate the effectiveness and feasibility of the proposed methodology and implementation procedure shown in Fig. 1.

1.A. Data
The data used in this example was collected from a centrifugal compressor in a petrochemical factory. Figure 2 shows a schematic of the compressor (Fig 2a) and its actual broken impeller (Fig 2b). The impellers were broken owing to wheel cracks resulting from the damaged filtering net on March 7, 2019. Operating data from December 28, 2018, to April 1, 2019, are used in this example to demonstrate the effectiveness of the proposed methodology.

1.B. Frequency feature extraction
A fast Fourier transform was applied to the wave samples to produce frequency and magnitude values, as shown in Fig. 4. The frequency features including one base frequency of 96. 15 Hz and two times of base frequency (192.3 Hz), as well as the corresponding magnitude and angle values can be readily obtained from the frequency analysis.

1.C. Data preprocessing
For demonstration purposes, 77 feature features obtained from seven sensors in the compressor were used for data preprocessing. Upward and downward data-fill methods were used to process the missing or infinite values in the feature series. The feature data for all variables were normalized using the max-min method to eliminate the amplitude effects of the various variables. As an example, Figure 5 shows the time series trends of the four feature variables for the raw data (Fig.   5a) and the normalized data (Fig. 5b). It can be observed that the magnitude of the original feature data (Fig. 5a) varies significantly, whereas the normalized data (Fig. 5b) display basically in a stable range.

1.D. Bayesian wavelet thresholding
The Bayesian wavelet packet denoising approach described previously was employed to remove noise from feature data. Using a trial-and-error approach, the three-level DWPT decomposition with the Daubechies function of eight mother wavelets was found to be sufficient for removing noise from the signals. For instance, given the feature sequence of variable CC_Xend_one_freq_y, the SNR and RMSE values obtained from (2) and (3) are 20 dB and 0.1, respectively, implying that the signal is contaminated to a certain degree. Figure 6 shows a comparison of the time series (Fig. 6a) and power spectral density (PSD) (Fig. 6b), where the solid and dashed lines represent the raw and denoised feature sequences, respectively. The data series became much smoother after denoising (Fig. 6a). This denoising method effectively removes the high-frequency noisy components via multiscale wavelet decomposition of the raw data while preserving its overall trend, as shown by the dashed dotted lines in Fig. 6a and Fig. 6b.

1.E. PPCA feature extraction
As an example, Figure 7 shows the relationship diagram of four features randomly selected from the 77 features after denoising. It can be observed that three of the four features have a highly linear relationship and are highly correlated with each other. If they are directly used in modeling, model complexity will increase, parameter estimation accuracy will be affected, and calculation cost will increase significantly. Therefore, principal component analysis was applied to all denoised feature data matrices to retain 95% of the original b) a) time This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. value information, yielding a total of 11 principal components. PPCA with the same dimension reduction was then applied to obtain the weights and low-dimension principal components. Both the individual interpretation variance and the cumulative variance of PPCA are obtained by the ratio of the variance of each principal component to the total one after dimensionality reduction, as shown in Fig. 8. The first principal component represented over 61% of the raw data. The PPCA weights obtained from the training dataset were saved and subsequently applied to unused data for model prediction, whereas the obtained principal components were used to establish the enhanced OAKR model.

1.F. OAKR model establishment and validation
The obtained 11 PPCA principal components (PCs) were used to establish the OAKR model, as summarized in Table II. The three-month feature data at one-hour intervals were divided into four groups: model training, bandwidth optimization, model testing, and prediction. The 850 training PC datasets were used to produce the memory matrix, while 183 PC sets were used to optimize the bandwidth in the kernel density function of the OAKR model using the simplex method described previously. The optimal bandwidth was obtained through 12 simple iterations. Figure 9 shows the MSE curve versus the bandwidth with the optimal value of ℎ T5U = 1.169 and the minimum MSE of (ℎ T5U )= 0.58, obtained using (15). The obtained model was tested using a set of new data and then used for fault prediction. An MSE of 1.587 and a thresholding coefficient of 4 were obtained from the testing data. The threshold of 4√ = 5.038 was then used for fault prediction in this example.

1.G. Fault prediction
The trained model was used to predict the system response of the centrifugal compressor for data from February 15, 2019 to  Table II. Raw data were preprocessed and cleansed using the Bayesian DWPT thresholding approach. The PPCA coefficient matrix was similarly applied to the cleaned data to yield 11 principal components, as previously described. The trained OAKR model was then used to estimate the response of the system based on principal components.  As an example, Figure 10 shows the prediction results of four selected principal components, PC1 (Fig. 10a), PC2 (Fig. 10b), PC4 (Fig. 10c), and PC5 (Fig. 10d), which play key roles in fault detection. The predicted values of the four PCs are observed to deviate greatly from the actual values in the period of time before the occurrence of the fault, which captures the early trend of the fault and provides the theoretical basis for the anomaly alarm in the smart condition monitoring of the turbomachine. The moving window strategy was used to calculate the mean square error (MSE) of the predicted value of each PC and generate an alarm if the MSE value exceeded a predefined threshold. As shown in Fig. 11, the obtained MSE value exceeded the threshold and triggered an alarm on February 19, 2019; thus, 16 days earlier than the actual time of failure (March 7, 2019), providing a powerful tool for scheduling outage maintenance of the machine ahead of its unplanned breakdown owing to significant damage.

1.H. Critical factors identification
To further analyze the variables that directly affect component damage. Figure 12 shows the loadings of the two PCs (PC1 and PC2), which have a large impact on failure, obtained by Eq. (10), indicating the contribution of each variable to the PC. The results showed that the damage was represented by multiple features, as expected. Considering the high correlation between the original features, it is reasonable to assume that the component fault causes fluctuations in multiple principal features, which are identified as the main factors impacting component reliability.

2.A Data
The proposed methodology and procedure are further applied to predict the stage 1 bucket crack in a heavy-duty gas turbine to illustrate its generalization and robustness. Industrial heavyduty gas turbines are commonly used to generate power by converting gas energy into mechanical work. Owing to the frequent start-up and long-term high-speed operation of a gas turbine, various failures are frequently observed in critical subsystems, such as turbines and compressors, some of which cause huge loss of life and properties. This example demonstrates the effectiveness of the proposed methodology for detecting stage 1 bucket cracks using relative monitoring data collected from a real-world heavy-duty gas turbine. The unit was shut down on January 20 th , 2015, owing to bucket failure on the stage 1 turbine wheel. Time historical data of 37 sensors (tags) installed on the unit were extracted from the control system of the machine from January 2013 to January 2015 and used for demonstration purposes. The 21 variables shown in Table IV are employed in this example according to  the inputs of the domain experts and preliminary data analysis.  The sensor tag names are defined in Table IV, with example values for illustration purposes. The sensor data for each tag were extracted at intervals of 5 min for subsequent analysis.  Figure 13 shows the basic information of the gas turbine event used in this example, including the approximate layout of sensors in the turbine (Fig. 13a) shown in Table IV, an illustrative photo of the bucket cracks at the stage 1 wheel (Fig.  13b), the crack locations in the stage 1 bucket (S1B) (Fig. 13c), and the numerical simulation of the temperature of one damage S1B (Fig. 13d). The 21 critical variables defined in Table III are used as examples for damage prediction. Multiple S1B parts were observed to have cracks during this event, as shown in Fig. 13b. A typical S1B specimen was observed to have three different locations with cracks (Fig. 13c). The stress concentration resulting from mechanical-thermal cycling was observed in the numerical simulation, as shown in Fig. 13d. In this study, operation data from Jan 1 to Nov 1, 2013, were used to train the proposed model, while those from Dec 2013 to Dec 2014 were employed to test the trained model.

2.B Data preprocessing
As described in the implementation flow chart of Fig.1, the raw time series data of each sensor are preprocessed, including 1) eliminating shutdown data, 2) filling of null and infinite values to ensure data integrity, 3) data standardization to eliminate the dimensional influence of different variables, and 4) wavelet packet Bayesian threshold denoising to remove uncertain noise in the data. As an example, Figure 14 shows the time series plots of the CTD sensor data without (a) and with the removal of shutdown points (b). It can be clearly observed that the time series in Fig. 14b shows a distinct trend after removing the outliers in comparison to the raw data in Fig. 14a, implying the importance of data preprocessing. For illustration purposes, Figure 15 shows the time series plots of the normalized data for four sensor tags, that is, CTD, CTIM, TTWS1AO1, and TTWS2FO1, as defined in Table III. It is observed that the trend of each variable indicates a large variation, which is rarely used to directly detect the possible damage in the component. However, multivariate time-series data collected from the system may be highly correlated with each other. In particular, various temperature sensors on the turbine wheel space show strong correlation, as indicated in the correlation heatmap in Fig. 16 for the variables listed in Table III,  Similar to Example 1, prior to the PPCA dimension reduction, Bayesian wavelet thresholding is applied in this example to reduce possible noise in each set of time-series data. As an example, the SNR and RMSE of the CTD data, 58.0 and 0.99, were obtained from (2) and (3), respectively, indicating that the sensor data are lightly noisy as the RMSE is relatively small with regard to the averaged CTD values of 782F. Furthermore, Figure 17 shows a graphical comparison of the CTD data in terms of the time series (Fig. 17a) and power spectral density (PSD) plots (Fig. 17b), where the solid and dashed lines represent the raw and denoised signals, respectively. It can be observed that the difference between the raw and denoised data cannot be easily identified from the time series plot (Fig. 17a) but is clearly shown in the frequency graphs at high frequencies (Fig. 17b). This is reasonable because noise is typically contained in the high-frequency domain of the signal. This indicates that Bayesian wavelet packet threshold denoising retains the overall trend of the raw data while reducing high-frequency noise from the raw signal. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.  Fig. 18.

FIGURE 18. Explained variance in PPCA (Example 2)
Similarly, the principal component (PC) data obtained from the PPCA dimension reduction are divided into four groups, that is, training, optimization, validation, and prediction sets, as shown in Table Ⅴ. For the sake of computing efficiency, the PC data were resampled from 5 minutes interval to 1 h for subsequent modeling.

2.C Fault prediction and diagnosis
The six PC variables obtained from the PPCA were used as the inputs of the OAKR model. Both the training and optimization datasets shown in Table Ⅴ Figure 19 shows the model prediction results for the six PC variables in comparison with the original ones. It can be clearly observed from Fig. 19 that the predicted values of PC2 (Fig. 19b), PC5 (Fig. 19e), and PC6 (Fig. 19f) deviated from the original values before the failure on Jan 20 th , 2015, which was much larger than the other three. This implies that the failure initiated early in Feb 2014 and can be detected in advance to a large extent using the proposed methodology, and the sensor variables mainly contributing to the three PCs play a key role in the failure. Again, the moving window strategy was used to calculate the mean square error (MSE) of the predicted value of each PC and then generate an averaged MSE metric for the overall condition evaluation of the component over time, as shown in Fig. 20. Using the threshold value of 2.990 obtained from the validation data, an alarm was triggered when the MSE value exceeded the threshold on March 8, 2014, which was more than 10 months earlier than the actual failure time. It has been well recognized that blade cracks develop slowly in practice because of the constraints of buckets in the wheel, where they may exist for over one year of operation in some cases, as explained in the detailed failure analysis (e.g., Zdzislaw et al. [25] and Chang et al. [26]). Once again, this case demonstrates that the proposed methodology provides a powerful tool to trigger an early alert on component cracks for arranging condition-based maintenance in advance. Based on the results shown in Figs. 19 and 20, it can be clearly observed that the fault was mainly attributed to PC2, PC5, and PC6. Figure 21 shows the loadings of the three PCs obtained using Eq. (10), which indicates the contribution of each variable to PCs. This indicates that the faults are primarily represented by six variables: CTD, CTIM, CSGV, DWATT, and two-stage one-wheel space temperature variables. This is consistent with the actual event of a stage-one bucket crack.

V. Concluding remarks
This paper presents a novel probabilistic optimized similaritybased learning (SBL) and signal reconstruction methodology for the accurate fault prediction of rotatory machines by seamlessly integrating two advanced signal reconstruction techniques with the optimized auto-associative kernel regression (OAKR) approach. The Bayesian discrete wavelet packet thresholding is employed as an advanced timefrequency reconstruction approach to cleanse the raw signals by integrating Bayesian hypothesis testing and multiresolution wavelet analysis, which effectively avoids under-denoising due to multiscale signal decomposition or over-denoising due to unbiased judgment of the decomposition coefficients. Probabilistic principal components analysis is utilized to extract the main information from multivariate feature variables, considering data uncertainties, thus lowering the feature dimension and yielding uncorrelated principal components as inputs to improve the accuracy and efficiency of industrial equipment fault prediction. The enhanced methodology with PPCA distinguishes it from the method presented by Tang et al. [22] in two aspects. First, it reduces the uncertainty and dimensions of multivariate sensor data, where the uncorrelated principal components are used as inputs for fault prediction. Second, PPCA is also utilized to identify the critical factors for facilitating the fault diagnosis of rotating machines, thus allowing for fault prediction and diagnosis simultaneously. Instead of manually modifying the bandwidth parameter in the conventional auto-associative kernel regression method, the Nelder-Mead simplex algorithm was developed to automatically optimize the key parameters in the model, thus significantly improving the model prediction accuracy and efficiency. The alarm threshold is determined adaptively using a rolling window strategy based on the difference between the predicted and actual values. A generalized procedure was developed to automate the application of the proposed methodology for fault prediction of rotatory machines under data uncertainties. The effectiveness and feasibility of the proposed probabilistic signal reconstruction methodology and procedure are demonstrated using data and events collected from two realworld large turbomachines, a centrifugal compressor, and a heavy-duty gas turbine. The numerical results indicate that the proposed methodology provides a powerful tool for early warning of machine anomalies, which facilitates real-time condition monitoring of rotatory machines for smart maintenance. The numerical implementations of two different rotating machines have also demonstrated the strong generalization and robustness of the proposed method.
In future research, the proposed optimal probabilistic similarity-based learning method along with multiple advanced signal reconstruction techniques will be investigated with different rotating machines such as wind turbines and steam turbines to validate its generality and robustness in general turbomachines, and integrated into a comprehensive digital twin-oriented software platform for smart monitoring and diagnosis of industrial equipment.