Remaining Useful Life Prediction for Rolling Bearings With a Novel Entropy-Based Health Indicator and Improved Particle Filter Algorithm

Due to the importance of bearings in modern machinery, the prediction of the remaining useful life (RUL) of rolling bearings has been widely studied. When predicting the RUL of rolling bearings in engineering practice, the RUL is usually predicted based on historical data, and as the historical data increases, the prediction results should be more accurate. However, the existing methods usually have the shortcomings of low prediction accuracy, large cumulative error and failure to dynamically give prediction results with the increase of historical data, which are not suitable for engineering practice.To address the above problems, a novel RUL prediction method is proposed. The proposed method consists of 3 parts: First, the multi-scale entropy-based feature – namely “average multi-scale morphological gradient power spectral information entropy (AMMGPSIE)” – from the rolling bearings as the Health indicator (HI) is extracted to ensure all the fault-related information is well-included; Then, the HI is processed with the enhanced Hodrick Prescott trend-filtering with boundary lines (HPTF-BL) to ensure good performance and small fluctuation on the HI; Finally, the deterioration curve is predicted using an LSTM neural network and the improved Particle Filter algorithm that we proposed. The proposed method is validated using the experimental bearing degradation dataset and the casing data of a centrifugal pump bearing from an actual industrial site. Comparing the results with other recent RUL prediction methods, the proposed method achieved state-of-the-art feasibility and effectiveness, conform to the needs of practical application of the project.


I. INTRODUCTION
Rolling bearing is widely used in rotating machinery and is one of the key wearing parts. With the arrival of Industry 4.0 and the thriving development of Prognostics and Health Management (PHM) research, rolling bearing monitoring can be achieved by analyzing the real-time vibration data collected by the intelligent sensors installed on the machines, such that the service life of rolling bearings is prolonged, The associate editor coordinating the review of this manuscript and approving it for publication was Yiqi Liu . and production efficiency greatly increased. One major part of PHM is the prediction of remaining useful life (RUL), accurate RUL predictions can provide crucial information for maintenance decisions and has been a popular research topic in recent years [1], [2].
At present, there are mainly three methods for predicting the RUL: the model-based method, the data-driven method, and the method combining both [3]. For the model-based method, some mathematical models are usually established to describe the deterioration process of the system, with the parameters of these models being updated in real-time based on the observed data, the future states of the system can be determined, thereby predicting the remaining useful life.
Common model-based methods include Kalman Filter, Particle Filter, etc. The prediction method of Particle Filter is proposed based on the Bayesian filtering algorithm, which has been widely used in the field of RUL prediction due to its improvements in the certainty of long-term predictions. Li et al. [4] proposed a prediction method based on the particle filter and used correlation matrix clustering and weighting algorithms to calculate the health indicators. Zhu et al. [5] applied the particle filter algorithm to fatigue crack growth parameter estimation and RUL prediction, acquiring the median value and confidence interval of residual life. Lei et al. [6] proposed a model-based RUL prediction method for rolling bearings with a health indicator construction module and an RUL prediction module: the former constructs a new health indicator named weighted minimum quantization error, and the latter predicts the RUL using particle filtering.
The above research has proven the advantages of Particle filtering in solving long-term uncertainty prediction problems, however, the algorithm remains flawed that after multiple iterations, the particle weights would degrade. This is normally solved by optimizing the resampling steps: Yu et al. [7] and Liu et al. [8] separately used unscented particle filter to predict the exhaust temperature of space engines and the state of lithium battery and achieved better results than the original particle filter algorithm; Xu et al. [9] proposed a particle filter prediction algorithm improved by particle swarm optimization algorithm and accurately predict the remaining useful life of rolling bearings; He et al. [10] proposed to optimize the resampling process of particle filter by using the beetle antennae search algorithm to solve the problem of particle diversity loss, and applied this method to predict the remaining useful life of batteries also obtained good prediction results; Xu et al. [11] proposed an indicator based on the mean square harmonic noise ratio, and then used the regularized particle filter algorithm which improved the resampling process by using the kernel function based on the Euclidean distance, and achieved good results in predicting the RUL of the rolling bearing.
When building the state-space model of rolling bearings in the deterioration process, the most commonly used models are the double exponential model and the Paris model [12]. These two models can describe the fatigue deterioration process of bearings well, but cannot deal with other types of deterioration -which is yet another flaw for the particle filtering algorithm applied in RUL prediction. The datadriven method, however, is to predict the remaining useful life by using statistical analysis or machine learning methods based on obtained data. This type of method can be calibrated and describe the deterioration process of bearings through the obtained data, countering the defects of the particle filter algorithm. Therefore, the data-driven improved particle filter prediction method has become a hot research direction in recent years.
Cui et al. [13] proposed a comprehensive prediction model based on time-varying particle filter, this method can adaptively select the optimal state-space model according to the characteristics of the data, which improves the prediction accuracy for bearing life; Ge et al. [14] proposed a data-driven improved particle filter prediction method, applying a quantum genetic algorithm to solve the particle deterioration problem, and used LSTM neural network to predict the trend of model coefficients achieving good results in predicting the remaining useful life of aero-engines; Wu et al. [15] put forward a prediction method combining bat algorithm optimized particle filter and neural network, which is used to predict the state of lithium ion battery, and concluded that the neural network prediction model with two hidden neurons is better than that with three hidden neurons; Zhang et al. [16] proposed a prediction method using levy flight optimization particle filter, and used LSTM to learn the deterioration state curve of lithium ions, serving as the state-space model of particle filter algorithm, achieving good prediction results; Chen et al. [17] proposed a prediction method combining a new particle filter algorithm with GNN to predict the state of lithium batteries. These scholars have conducted a lot of research on the prediction method combining neural network and improved particle filter, and made some contributions, but these contributions are basically applied to the prediction of lithium-ion battery life or the RUL of aero-engines. Only a few studies have used the method of combining neural network with an improved particle filter to predict the RUL of rolling bearings.
In the prediction process, good health indicators (HIs) with comprehensive performance will improve the accuracy of prediction [18]. In many recent works, there exists a problem that HI fluctuates greatly, and it is impossible to determine an appropriate confidence interval, leading to low prediction accuracy [19]; At present, a 95% or 99% confidence interval is usually used, but it is too subjective to apply to all HI, resulting in poor generalization ability [6]. To solve these problems, Chen et al. [20] proposed a quadratic function-based deep convolutional auto-encoder (QFDCAE) constructed HI, which can automatically extract bearing HI from the original vibration signal without expert knowledge; Zhou et al. proposed a [21] distribution contact ratio metric health indicator (DCRHI) that can well represent the degradation process and obtain a unified failure threshold, combined with the improved gated current unit (GRU), it can more accurately predict the remaining service life of the bearing; Qin et al. [22] also proposed an HI construction method based on degradation trend constrained variable autoencoder (DTC-VAE), which can adaptively generate HIs with obvious degradation trend. These articles have made great contributions to the HI construction part of RUL prediction research, while in this paper, a different and more effective solution is proposed for HI construction and HI processing.
In order to help enterprises and factories to determine the best predictive maintenance time, and to solve the existing problems of HI construction and data-driven RUL prediction, VOLUME 11, 2023 a new method for predicting the RUL of rolling bearings is proposed in this paper. The new method includes three parts: HI construction, HI processing, and RUL prediction. First, a new HI based on multi-scale entropy is constructed, which has good comprehensive performance; Secondly, a HI processing technology is proposed, processed by this technology, the HI can have appropriate confidence intervals, and the influence of the difference between the real value and the filtered value of the signal on the RUL prediction results can be eliminated, thus improving the prediction accuracy; Finally, LSTM neural network and our improved particle filter algorithm are used, according to the dynamic prediction deterioration curve of real-time data, the improved particle filter algorithm solves the problem of particle weight degradation, LSTM neural network is used to update the state space model before each prediction, this state space model describes the degradation process of the current rolling bearing well, and can effectively reduce the cumulative error of prediction.
The main contributions of this paper are summarized as follows.
1) The ''average multiscale morphological gradient power spectral information entropy'' (AMMGPSIE) is proposed as the health indicator extracted from rolling bearing vibration data, this HI has good monotonicity and can describe the deterioration process of rolling bearings well; 2) A new health indicator processing technique called ''Hodrick Prescott trend filter-boundary line'' (HPTF-BL) that calculates the upper and lower deterioration boundary lines and the main deterioration trend of the features is used, it helps select the appropriate confidence intervals and reduce prediction errors; 3) A hunter-prey optimization algorithm improved particle filter with a long-short-term-memory network as the statespace model (HPO-PF-LSTM) was applied for predicting the remaining useful life of rolling bearings, it solves the problems of particle depletion and achieved what the traditional state-space model cannot do well: representing the forms of bearing deterioration.
The rest of this paper is organized as follows: Section II introduces the basic theoretical background of the proposed method; In Section III, the RUL prediction model of rolling bearing is presented; Section IV uses laboratory data and engineering data to verify the effectiveness of the proposed method; Section V compare the proposed method with some of the latest RUL prediction methods to further present its superiority; Section VI concludes this paper.

A. POWER SPECTRUM INFORMATION ENTROPY
The operational reliability and performance deterioration of rotating machinery can be effectively represented with features extracted from the continuously collected vibration signal sequence. For the same rotating machine, the information entropy method can be used to evaluate the health status of the machine with the comparison of healthy state operation measurements. ''Information entropy'', which quantifies the certainty of a certain event, was first proposed by Shannon [23]. In the case of rotating machinery, as the number of operating states increases, the uncertainty increases, so the information entropy value also increases. Therefore, the lower the information entropy for a certain operation, the higher the chance the machine is operating in a healthy state [24].
Assume a certain random signal x t . It can be expressed by the following equation: . . x m denotes the variable that constitutes the signal, with p i representing the chances of occurrence of x i . Then, a distribution vector representing the probability of occurrence of a random variable in the signal can be constructed as P = {p 1 , p 2 , . . . , p m }. The information entropy of the signal x t can be then expressed as: where m i=1 p i = 1 and p i satisfies 0 ≤ p i ≤ 1. The power spectral density (PSD) of the signal is usually used to represent the power of the signal in the frequency domain [25]. Assuming a discrete signal {s (n) , n = 1, 2, 3, . . . , N } where N is the length of the signal in the time domain. {S (w) , w = 1, 2, 3, . . . , W } is the frequency domain representation of the signal, with W as the number of sampling points in the frequency domain of the signal. The power spectrum of the signal can be expressed as: Substituting PSD (w) from Eq. (3) into Eq. (2), the information entropy of the power spectrum of the discrete signal s (n) can be obtained: With the maximum value of the power spectrum information entropy being ln (1/W ), the normalized power spectrum information entropy (NPSIE) can be obtained:

B. TRADITIONAL PARTICLE FILTER ALGORITHM
Particle filtering is developed based on traditional nonlinear filtering methods such as the Kalman filter and the extended Kalman filter. Similar to them, particle filter adopts Bayesian theory and estimates the probability density function of the system state based on system observations. At the same time, the particle filter also applied a sequential importance sampling algorithm. With Bayesian theory and sequential importance sampling algorithm, particle filter has shown obvious advantages in model parameter estimation of non-linear and non-Gaussian systems and has been applied in the field of RUL prediction [26]. Assume that the system is in discrete time series t k = k· t, the state can then be described by the following state transfer function: where x k is the system state in t k , f k is the system state transfer function, θ k is the model parameter vector, and ω k is the system noise.
While the state of the system cannot directly be measured, it is necessary to use the observation of the system to estimate the state of the system according to Bayesian theory. The relationship between the system state and the system observation is as follows: where z k is the system observation at time t k , h k is the observation function of the system, and v k is the measurement noise. First, the prior probability density function at time k is predicted according to the system state probability density function and state transfer function at time k − 1: After the new observation value is acquired, the state probability density function is updated to obtain the posterior probability density function at time k: Sometimes, it is quite difficult to sample directly from the state probability density function p (x k | z 1:k ). Therefore, the sequence importance sampling algorithm uses an importance density function q (x k | | z 1:k ) to approximate the state probability density function p (x k | z 1:k ), describing it using the discrete sampling points of the importance density function and the corresponding weights. Eq. (8) is then transformed into: where the importance weight is updated with the following equation: Select the prior probability density that's easier to realize as the importance density function, Eq. (11) can be simplified as: To update the model parameters, the parameter term θ k is added to the conditional probability of Eq. (13): Therefore, as long as the conditional probability p (z k | x k ,θ k ) is obtained, the weight corresponding to the system state x k and model parameter θ k can be updated continuously using the new observation value z k .

III. CONSTRUCTION METHOD OF ROLLING BEARING RUL PREDICTION MODEL WITH A NOVEL ENTROPY-BASED HEALTH INDICATOR AND IMPROVED PARTICLE FILTER ALGORITHM
The morphological gradient operator (MGO) is defined as the difference between the results of signal f after expansion and corrosion through structural element g: where ⊕ represents the expansion operator and represents the erosion operator. For more details about the operators and the structural element g, please refer to [27]. While the morphological gradient is often used as an edge detection tool in image processing, in signal processing, it can detect and extract the transient information on steady-state signals, so that pulse information can be made apparent and the position and shape of pulses effectively detected [28]. The raw vibration signals of rolling bearings usually contain a huge amount of noise, which will inevitably affect the power spectrum and reduce the accuracy of the power spectrum information entropy analysis results. With the good characteristics of the morphological gradient operator discussed above, the MGO is extended and applied it when analyzing the power spectrum to deal with the noise problem. And on this basis, the feature based on the average multiscale morphological gradient power spectrum information entropy (AMMGPSIE) is proposed.

2) AVERAGE MULTISCALE MORPHOMETRIC GRADIENT POWER SPECTRUM INFORMATION ENTROPY (AMMGPSIE) INDEX CONSTRUCTION
Multi-scale analysis can be applied to the measurement of the complexity of a sequence and reflect the hidden information [29]. Costa et al. [30], [31] proposed the idea of multi-scale entropy (MSE) -obtaining the representation of the sequence at different scales by coarse-graining the sequence. In this paper, based on the power spectrum information entropy and the morphological gradient operator, combined with multiscale analysis, an innovative calculation method for the average multiscale morphological gradient power spectrum information entropy is proposed and used for the feature extraction of rolling bearing performance deterioration. The steps for calculating the average multi-scale morphological VOLUME 11, 2023 gradient power spectral information entropy (AMMGPSIE) are as follows: Step 1: For a time series {s (n) , n= 1, 2, 3, . . . ,N } of length N, the coarse-grained signal is first obtained by calculating the average value of each segment of the sequence: Step 2: Calculate the information entropy of the morphological gradient power spectrum of the coarse-grained signal at each scale factor τ ; a. The frequency domain expression of each coarsegrained signal y where W is the number of sampling points. Then, morphological gradient transformation is applied on the signal group S (w) to acquire X (w): b. The morphological gradient power spectrum of the signal is calculated, and its expression is as follows: c. Combined with information entropy theory, the information entropy of the morphological gradient power spectrum of the signal is calculated, and its expression is as follows: d. The information entropy of the normalized morphological gradient power spectrum of the coarse-grained signal under each scale factor τ is shown in the following equation: Step 3: Steps 1 and 2 are the calculation process of multiscale morphological gradient power spectrum information entropy. The average multiscale morphological gradient power spectral information entropy (AMMGPSIE) used in this paper is calculated by calculating the average of the normalized morphological gradient power spectral information entropy calculated with Eq. (19) at different scales.

3) MULTI-SCALE PARAMETER SCREENING
The performance degradation of rolling bearings is monotonous and irreversible, so monotonicity can be used to judge the performance of the health indicators [32], in addition, robustness and correlation can also be used to evaluate HI performance [18]. First, the smoothing method is used to decompose the HI into average trend and random parts: where HI (t n ) is the value of HI at time t n , HI T (t n ) is the trend value and HI R (t n ) is the random term. The calculation of monotonicity, robustness and correlation is Eq. (21)- (23), as shown at the bottom of the next page. where ε (x) is the unit step function. When x is greater than or equal to 0, ε (x) is 1; when x is less than 0, ε (x) is 0. N is the total number of samples. From the 3 indexes we can acquire a compound index (CI): where ω i > 0 and 3 i=1 ω i = 1. According to [46], When applying the average multi-scale analysis, the AMMGPSIE evaluation indicators obtained by different scale factors are different. Here, PRONOTIA's bearing 1-1 data from IEEE PHM 2012 data challenge [33] is taken as an example: Table 1 shows the best parameter τ = 4. It corresponds to the AMMGPSIE health index with the highest comprehensive performance.
In Table 2, four performances of the proposed AMMGP-SIE are compared with four performances of other five commonly used health indicators designed for RUL prediction, including time domain root mean square (RMS) and four complexity measurements: wavelet information entropy [34], C0 complexity [35], power spectral entropy [36] and highorder differential mathematical morphological gradient spectral entropy [37].
As the comparing results show, the proposed HI: Average multi-scale morphological gradient power spectrum information entropy (AMMGPSIE) has better performance than the other 5 HIs and is suitable for prediction.

B. SIGNAL PREPROCESSING OF HEALTH INDICATOR BASED ON HPTF-BL 1) HP TREND FILTERING
The HP trend filtering method was proposed by Hodrick and Prescott [38] in 1997. It is a commonly used data analysis method in economics. It can decompose data into long-term trend items and short-term fluctuation items. Because it can reduce the impact of data noise, it is often used to extract the trend of various time series. It is now widely used in data prediction, reliability analysis of performance degradation, and other related fields.
The HP trend filter is a high pass filter that uses the least squares loss function and l2 norm to calculate the quadratic difference matrix. It decomposes the time series x t into long-term trend items and fluctuation items with random fluctuation characteristics. The trend term is defined as the solution of the following equation: where i is the serial number of the time series data and n is the sample number of the time series. The first item represents the tracking degree of the trend item to the original sequence, the second item represents the smoothness degree of the trend item with λ as the smoothness parameter that controls the smoothness degree of the trend item. By finding the first-order derivative of the x k i sequence from Eq. (25), the trend sequence x k can be obtained: where G is the coefficient matrix and I is the identity matrix. When λ → 0, the tracking degree of the trend item on the sequence reaches the maximum. When λ → +∞, the degree of smoothness of the trend term sequence reaches the maximum. When λ = 0, the HP filtering method degenerates to the least squares method. Through the above HP filtering method, the time series can be decomposed into periodic terms and fluctuation terms. But at the same time, HP filtering has some defects, that is, the random noise of the vibration signal will gradually increase with time, and the difference between the real value of the signal and HP filtering value will bring errors in RUL prediction.

2) DESCRIBING HPTF-BL
In this paper, the Hodrick Prescott trend-filtering with boundary lines (HPTF-BL) method is proposed, which can well solve the difficulties to determine the appropriate confidence interval for high volatility health indicators, and the influence of random noise on the RUL prediction results, thereby improving the prediction accuracy. After being processed by the HPTF-BL, the local fluctuations of the health indicators are eliminated and a new group of health indicators (also including the original health indicators) is formed. The new indicator group has good monotony and low fluctuation. At the same time, a confidence interval based on the signal itself is formed, which enhances the interpretability of the confidence interval setting in the step of subsequent trend prediction, eliminating the difference between the real value of the signal and the HP filter value, and further improving the accuracy of the RUL prediction results.
Corr (HI ) = N n HI T (t n )t n − N n HI T (t n ) n t n N n HI (t n ) 2 − n HI T (t n ) 2 N n t 2 n − n t n 2 (23) VOLUME 11, 2023 The main steps of HPTF-BL are as follows: Step 1: Window division: divide the obtained known time series x t to obtain multiple windows containing m numbers in each group; Step 2: Sort the data in each window to obtain the maximum and minimum values in the window; Step 3: Replace all data in the window with the maximum value to get the upper boundary x up ; Replace all data in the window with the minimum value to get the lower boundary x low ; Step 4: HP filtering is used to process the upper boundary x and lower boundary x low of the time series x t obtained from Step 3, and get the main trend terms x R , the calculation of x R which is shown in Eq. (26).
The above steps can then construct the health indicator after the process: The data length of the processed health indicators is consistent with that of the original health indicators, but the data dimension becomes 3 dimensional, which is composed of the upper boundary, the lower boundary, and the main trends.

3) OPTIMIZING FEATURE PARAMETERS OF HPTF-BL
When processing the health indicators with the HPTF-BL method, it is necessary to select the parameters λ and m. m represents the amount of data in the window, its value can be at least 3 and at most the same value as the amount of data in the known sequence; The larger the value of m is, the more representative the results are, but some information is ignored -which we specifically don't want; The smaller the value of m, the more accurate the result can reflect the boundary information of the characteristic index, therefore, 3 is selected for m.
λ is an indicator that controls the smoothness of the time series. When processing the characteristic indicators, it is necessary to obtain the main degradation trend. The selection of smoothing indicator λ is often subjective: Choudhary et al. [39] used HP filtering to smooth the characteristics of economic data to analyze economic changes that reflect the real situation of economic development, they proposed that λ should be 100 when analyzing annual data, 1600 for quarterly data and 14400 for monthly data; Ravn and Uhlig [40] put forward a frequency criterion to calculate the corresponding λ value, and set λ = 1600 as the benchmark value when processing quarterly data in advance, referred to as λ b . The smoothing index λ b in processing monthly and annual data can be calculated by the following frequency criteria: where N represents the amount of data processed, N b represents the preset benchmark data amount, and here, N b represents the amount of data processed quarterly. When processing annual data, the smoothing index is calculated as (1600 × 1 4 4 = 6.25). This method has been verified to be feasible in determining the smoothness index λ. In the field of rolling bearing RUL prediction, the sampling interval and sampling time of the data are different, and the amount of data obtained is also different. While a group of full life cycle data is often much larger than the quarterly data in the economic analysis. The λ benchmark in economics cannot be directly used in the field of RUL prediction, Therefore, this paper takes a group of signals simulating bearing performance degradation as an example to set the HP filtering λ value reference applicable to mechanical equipment vibration signals. If the HPTF-BL method is required in other chapters of this paper, the λ value can then be calculated according to the λ value determined in this section through frequency criteria.
It can be seen from the simulation results that the analog signals passing through the HP filtering with different λ all become smoother compared with the original signal since the smoothing degree tends to stay stable between λ= 10 and λ= 20, 15 is selected as the benchmark value for λ when the amount of data is 100. When using the HPTF-BL method to process characteristic indicators in Section IV, λ can be calculated by frequency criterion and the benchmark value.

C. HPO-PF-LSTM-BASED RUL PREDICTION 1) USING THE HUNTER-PREY OPTIMIZER ALGORITHM
Hunter prey optimizer (HPO) algorithm is a new intelligent optimization algorithm proposed in 2022, it finds the optimized parameter by simulating the hunting process of animals and have the advantages of fast convergence and good optimizing ability [41].
First, the locations of the hunter and prey is initialized and the initial population is formed, then the fitness of each solution is calculated using the objective function. Eq. (29) is used to update the location of the hunter: where x (t) is the current hunter position, x (t + 1) is the next iteration hunter position, P pos is the position of the prey, µ is the average value of all positions, C is the balance parameter between the exploration and development strategies, its value decreased from 1 to 0.02 during iteration: where it is the current iteration number and Max is the maximum iteration number. Z is the adaptive parameter: where R 1 and R 3 are random vectors within [0, 1], P represents the index values where R 1 < C, R 2 is a random number within [0, 1], IDX are index values of vector R 1 qualifying (P== 0). The position of prey P pos is calculated, so as to calculate the average value of all positions according to the following formula, and then calculate the distance from the average position: The position with the largest average distance from the position is considered the prey position P pos . Assuming that the best safe position is the best global position, giving the prey a better chance of survival, the hunter may choose another prey and update the prey's position according to the following formula: where x (t) is the current position of the current prey, x (t + 1) is the next iteration position of the same prey, T pos is the global optimal position, Z is the adaptive parameter calculated by Eq. (30), R 4 is a random number within [−1, 1], C is the balance parameter between exploration and development. The cosine function and its input parameters allow the next prey position to be globally optimal at different radii and angles and improve the performance at the development stage.
In order to select the hunter and the prey, Eq. (29) and Eq. (33) are combined. R 5 is a random number between [0, 1], and β is an adjustable parameter, according to [41], the value of β is selected as 0.1. If the value of R 5 is less than β, the searched position is regarded as a hunter, and the next position will be updated with Eq. (29); If the value of R 5 is greater than β, the searched position will be regarded as prey, and the next position will be updated with Eq. (33).

2) PARTICLE FILTERING BASED ON HUNTER-PREY OPTIMIZATION ALGORITHM
The resampling process of the particle filter (PF) is improved using the hunter-prey optimization algorithm (HPO) to overcome the degradation of particle weight after multiple iterations of PF. The prior state of particles in the particle filter is taken as the individual position of the initial population of hunter and prey. The iterative optimization process is used to improve the distribution of particles. The degraded particle set is optimized to a high likelihood value so that most particles can be concentrated near the real state. This solves the problem that the function used for resampling in conventional particle filter algorithms is suboptimal. The implementation of the HPO-PF algorithm is shown in Table 3.

3) PERFORMANCE DETERIORATION DESCRIBING BASED ON LSTM
Once the shortcomings of PF with weight deterioration and particle depletion is overcomed, the next task is to construct a performance deterioration model for the rolling bearing. Because rolling bearings often work in harsh working environments, it is very difficult to establish an accurate deterioration model. In this paper, we involve a powerful deep learning tool: the LSTM neural network model to learn the deterioration model [42].
As shown in Fig. 2, a single LSTM block is composed of a forget gate, an input gate, and an output gate, along with the cell state. The circle represents operations, the arrow represents the transmission of the vector, and the ellipse is the non-linear activation function.
In the forget gate, the output from the previous layer h t−1 and the data at the current moment X t is selected through the Sigmoid activation function, discarding part of the input data: where W f , b f are the weight matrix and bias term of the forget gate and σ is the sigmoid activation function.
In the input gate, the units' state is updated, part of the memory is discarded, and the information is updated according to Eq. (35), with the candidate updated content calculated according to Eq. (36).
where W i , b i are the weight matrix and bias term of the input gate and W C , b C the weight matrix and bias term when updating the information. The information from the previous momentC t−1 is transmitted in the cell state and finally output to the next moment, the state is updated according to Eq. (37) so that the long-term memory and the current memory are combined to form a new one: In the output gate, Eq. (38) is used to determine which states are used as cell outputs and then calculate the outputs through the tanh function h t , as shown in Eq. (39):

4) HEALTH INDICATOR PREDICTING USING HPO-PF-LSTM
LSTM neural network is selected to build the state-space mapping model to avoid the difficulties caused by the deterioration process of rolling bearing and the complex external environment. Since the health indicator used for RUL prediction (AMMGPSIE) itself is already in the [0, 1] interval, when building the LSTM model, no normalization layer is used. The Adam optimizer is applied for optimization [43]. The selection of the hidden layer numbers will be discussed in detail later. Select the weight and deviation of the LSTM neural network as the system state: Suppose that these eight state variables follow the random walk mode [15], The state space model is constructed as follows: where X (t) is the estimated value for the particles, f (·) is the degradation function trained with historical data by the LSTM model. All the noise terms, including w 1 to w 8 and A are all assumed to obey the gaussian distribution.
According to the HPO-PF-LSTM prediction method, the prediction value at the next time can be expressed as: After extrapolating the prediction result to the fault threshold line using Eq.(41), the RUL prediction result can be calculated. Each prediction is to use all the known data to establish the state space model, which can eliminate the cumulative error of prediction. The flow chart of the prediction process is shown in Fig. 3.
In order to verify the HPO-PF-LSTM prediction method, root means square error (RMSE) and fitting effectiveness R 2 are used as indicators to evaluate the prediction performance.
The root means square error (RMSE) is defined as follows: The fitting degree between the predicted results of each model and the residual life curve of the bearing is evaluated by the correlation coefficient R 2 . The value of R 2 is between 0 and 1, the larger the value is, the better the model fits the data. R 2 is defined as follows: where y i is the i-th ground truth,ŷ i is the i-th prediction,ȳ is the average of the time sequence, and K represents the number of samples.
In the HPO-PF-LSTM method, the number of hidden layers in the LSTM network and the particle number of the particle filter algorithm can both impact the prediction results. Taking the Bearing 1-1 data from PRONOSTIA in the IEEE PHM 2012 Data Challenge [33] as an example, since the selection of hidden layers and particle numbers are independent, the impact of different LSTM hidden layers and particle numbers on the prediction results will be discussed respectively. The data with 850 known samples are selected for training. First, the impact of different LSTM hidden layers on the prediction results is explored. Each hidden layer contains 80 hidden units [44], and the number of particles is selected as 300. The prediction results are evaluated using the above two prediction results evaluation indicators. The results are shown in Table 4.
It can be seen from Table 1 that when the number of LSTM hidden layers is gradually increased to 5, the results (combining the 2 evaluation indicators) achieved optimal and won't get any better even if the number increases. Then, the  effect of different particle numbers on the prediction results is explored. The number of LSTM hidden layers is set to the optimal number 5 as obtained. With a similar process, R 2 should be considered first then RMSE to determine the final selection of particle numbers [45]. 500 is selected as the optimal number of particles, according to Table 5.

5) BUILDING THE RUL PREDICTION MODEL
The flow chart of the proposed method for predicting the remaining useful life of the rolling bearing is shown in Fig. 4.
The detailed steps for predicting the RUL of rolling bearings are as follow: Step 1: Deterioration feature extraction: Obtain the realtime original vibration signal of the rolling bearing and calculate the AMMGPSIE of the input original data. The calculation process of AMMGPSIE is discussed in detail in Section III.A.
Step 2: Deterioration start time determination: Use the intelligent learning method of early warning threshold based on beta distribution [46] to judge whether the degradation starts. If it starts, go to Step 3. Otherwise, continue to Step 1.
Step 3: Health indicator processing: The HPTF-BL feature processing method proposed in Section III.B is used to process the calculated AMMGPSIE health indicators.
Step 4: Predicting the remaining useful life: The LSTM neural network is used to train the state space model, the HPO-PF algorithm updates and iterates the parameters, and the bearing deterioration curve is predicted recursively by the algorithm; The obtained curve is compared with the preset failure threshold to obtain the optimistic estimation results, reference estimation results and conservative estimation results of the RUL.

IV. CASE VERIFICATION A. LABORATORY CASE VERIFICATION 1) DATA INTRODUCTION
Bearing 1-1 and bearing 1-3 from PRONOSTIA in the IEEE PHM 2012 Data Challenge is selected for laboratory data validation [33]. The PRONOSTIA platform can perform the full life cycle operation experiment on the rolling bearing as shown in Fig. 5. The vibration signal of the rolling bearing is recorded by the acceleration sensor, which samples once every 10 seconds, with a sampling frequency of 25600 Hz. Therefore, each sample contains 2560 data points, i.e., 0.1 s.

2) DATA VALIDATION
First, the original vibration signals in the x-axis direction of bearing 1-1 and bearing 1-3 are extracted, and the   AMMGPSIE index is calculated. Because the AMMGPSIE index has only a small range of change, with the actual value being within 0.3, the failure threshold is set as AMMGPSIE value when increased by 0.1 [24], [34]. The early warning threshold intelligent learning method [46] based on beta distribution is used to determine the deterioration start time of the bearing, as shown in Fig. 6, with the red dotted line indicating the determined deterioration start time. The deterioration information of bearing 1-1 and bearing 1-3 is shown in Table 6.
Secondly, the HPTF-BL characteristic index processing method proposed in Section III.B of this paper is used to process the sample data after reaching the degradation time to obtain the upper and lower degradation boundaries and main degradation trends; Then, when the number of known   HPO-PF-LSTM algorithm is used to train the known data to show the results of iterative prediction of the failure degradation curve. The prediction results are shown in Fig. 7 and Fig. 8 respectively. It can be seen from the prediction results that as the number of known data increases, the trend predicted by the HPO-PF-LSTM algorithm will become closer to the actual AMMGP-SIE value, and the RUL (T n ) of the rolling bearing in time T n can be calculated: (45) where N p n is the sample number corresponding to the predicted bearing failure time; N s n is the sample number corresponding to the moment when the prediction started; and t s is the sampling time, which is 10 s in this case. Finally, the remaining useful life of the bearing when the known data reach different sample numbers can be calculated according to Eq. (45). The results of RMSE and R 2 between the real health indicators are shown in Table 7 (unit: seconds): It can be seen from Fig. 7 and Table 7 that when the number of known samples of bearing 1-1 reaches 550 and 850, the predicted results delay the actual value by about 1000-2000 seconds. However, with the increasing number of known samples, as shown in Fig. 7, when the number of samples reaches 1150 and 1450, as the overall trend of the calculated health indicators fluctuates downward, the prediction results of the HPO-PF-LSTM algorithm will be dynamically adjusted according to the changes in the health indicator. Although it gradually deviates from the actual value, it is VOLUME 11, 2023 quite common in engineering scenarios. With the increase in the number of known samples, the prediction results will gradually tend to the actual value.
It can be seen from Fig. 8 and Table 7 that when the number of known samples of bearing 1-3 reaches 300, the predicted results delay the actual value by about 2000 seconds. When the number of known samples reaches 400 and 500, the prediction result gradually approaches the actual value, which is only 500 seconds away from the real prediction result. When the number of known samples reaches 600, the difference between the prediction results and the real results is only tens of seconds, and the real RUL value is within the range of optimistic prediction results and conservative prediction results. The optimistic forecast results, reference forecast results, and conservative forecast results can jointly provide decision-making suggestions for predictive maintenance.

B. ENGINEERING CASE VERIFICATION 1) DATA INTRODUCTION
In order to verify the application value of the proposed RUL prediction model in engineering, a group of bearing vibration data collected in the actual production process from the factory is used to verify the practicability of the method. Over the last 20 years, the author's research team has remotely monitored about 3000 pumps from factories, and accumulated about 1500 pump failure cases. In this paper, the refined diesel pump 204-P-203A is used as the research object, and the structure of the pump is shown in Fig. 9. The speed of the diesel pump is 3000r/min, the drive end bearing model is type 6217 deep groove ball bearing, the sampling frequency of the accelerometer is 25600 Hz, and the sampling interval is one hour. A total of 311 data files were obtained from the original acceleration data from 19:00 on May 26, 2021, to 24:00 on June 3, 2021. Each data file consists of 16384 points.

2) DATA VALIDATION
First, the features of the original vibration signal in the x-axis are extracted, and the AMMGPSIE indicator is calculated. The deterioration information is shown in Fig. 10 and Table 8. When the sample number is 150, the deterioration starts, and the prediction starts from the known sample 96. The HPTF-BL feature indicator processing method is used to obtain the upper and lower deterioration boundaries and the main deterioration trend of the known sample data. When the values reach 50, 70, 90, and 110, the HPO-PF-LSTM algorithm is used to train the known data, and the results of the iterative prediction of the deterioration curve are displayed. The prediction results are shown in Fig. 11.
According to Eq. (45), the remaining useful life of the bearing when the known data reach different sample numbers is calculated, and the RMSE and R 2 between the predicted main trend curve and the actual health indicator are calculated. The specific results are shown in Table 9. (Unit: hour): The real site conditions are complex and changeable, and most of the data obtained may be the same as the case data shown in this paper. It can be seen from Fig. 11 and Table 9 that when the sample size of known data reaches 50, 70, 90, and 110 respectively, the optimistic prediction results, reference prediction results, and conservation prediction results can be predicted. These prediction results can jointly provide decision-making suggestions for predictive maintenance.

V. COMPARATIVE ANALYSIS
To further verify the superiority of the remaining life prediction method proposed in this paper, three life prediction methods proposed in recent years are selected for comparison and analysis on the bearing1-1 dataset. These three methods are respectively the bearing RUL prediction method proposed by Ahmad et al. [47]; Wang et al. [48] proposed prediction method combining RVM and index model; Wang et al. [49] proposed prediction method of the improved exponential model. The Bearing1-1 data used for verification above is used for comparison. To ensure fairness, all of the HI uses the AMMGPSIE index proposed in this paper. By calculating the predicted remaining service life, the results can be seen in Fig. 12.
The cumulative Relative Accuracy (CRA) and Convergence [50] are used to measure the prediction accuracy of the model. CRA summarizes the relative accuracy of the prediction results at all times, comprehensively evaluates the accuracy of the prediction methods, quantifies the size of the cumulative error, and intuitively reflects the comparison results of various methods. The closer the CRA is to 1, the smaller the cumulative error and the better the prediction  result. The expression is as follows: where ω n is the weight calculated through ω n = n/ N n=1 n. ActRUL (T n ) is the actual RUL of the rolling bearing in T n . The predicted RUL (T n ) is calculated from Eq. (45).
The convergence degree can calculate the speed at which the predicted RUL converges to the true RUL. The expression is as follows: where C x ,C y represents the center of mass of the area under the prediction error curve, T 1 is the time to start forecasting. The comparison results are shown in Table 10: The comparison results show that the HPO-PF-LSTM prediction method proposed in this paper is superior to all other methods in terms of CRA and convergence, with higher prediction accuracy and smaller cumulative error, therefore having practical engineering application value.

VI. CONCLUSION
Considering the demand for rolling bearing RUL prediction for rolling bearings, this paper proposes a prediction model with a novel entropy-based health indicator and improved particle filter algorithm. The experimental results and comparative analysis show that: (1) A new method for predicting the remaining useful life of rolling bearings is proposed. It consists of three steps: extracting a new type of health indicator, depicting the upper and lower boundaries of health indicators, and predicting the RUL. This method can predict the current remaining useful life of rolling bearings based on historical time-series data to meet the requirements for health management in practical engineering applications.
(2) By using the prediction model proposed in this paper, the optimistic, reference and conservative estimation results of RUL with very high precision can be obtained, the results can help with selecting the best maintenance time in enterprises and factories.
(3) Compared with the three latest prediction methods, the proposed method has higher prediction accuracy. The validation results also show that the method is more robust with more known data and is more suitable for complex engineering scenarios.
In the next step, after obtaining more actual industrial case data, the author will continue to use, improve and verify the proposed method.