A Novel Remaining Useful Life Prediction Method Based on CEEMDAN-IFTC-PSR and Ensemble CNN/BiLSTM Model for Cutting Tool

To accurately predict the remaining useful life (RUL) of cutting tool, a novel RUL prediction method is proposed. Firstly, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used to decompose original cutting tool vibration signals to get six intrinsic mode function (IMF) components from each sample. Secondly, high-frequency IMF components and low-frequency IMF components are obtained from IMF components and they are respectively fused into high-frequency data and low-frequency data using the improved fine-to-coarse reconstruction (IFTC), and high-frequency data and low-frequency data are reconstructed using phase space reconstruction (PSR). Thirdly, multiple prediction branches are adopted to construct an ensemble RUL prediction model for cutting tool, the high-frequency data and low-frequency data are input into bi-directional long short-term memory (BiLSTM) and convolutional neural network (CNN) to train a RUL prediction model respectively in each prediction branch. Finally, a series of experiments are conducted to verify the effectiveness of the proposed RUL prediction method, and the results show that the proposed method obtains a high score of RUL prediction for cutting tool.


I. INTRODUCTION
The cutting tool directly contacts the workpieces in the computerized numerical control (CNC) machining process, which is the most important link to determine the machining accuracy and mechanical performance of the workpieces [1], [2]. The most common cutting tool damage is tool wear, which is the performance degradation of the cutting tool under the action of many factors [3]. The RUL prediction refers to predicting how much time remains before the failure of the equipment under the condition that the current mechanical The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Zunino. equipment status and historical one are known [4]. Recently, data-driven RUL prediction methods [5]- [8] based on deep learning have received increasing attention. Although the data-driven methods do not require a priori knowledge, the noise component in the original vibration signals of cutting tool usually has multiple complex features, so it is inefficient to use the original vibration signals directly for RUL prediction. Therefore, time-frequency domain signal processing technology is adopted to extract degradation information of cutting tool.
Empirical mode decomposition (EMD) [9] is an adaptive decomposition method based on modal components, which has widely been used to process vibration signals.
Geng et al. [10] successfully realized the RUL prediction of bearings through the IMF components decomposed by EMD method and correlation coefficient analysis. However, EMD cannot deal with modal aliasing caused by envelope estimation error. Ensemble empirical mode decomposition (EEMD) increases the continuity of different scales by adding auxiliary noise to the original signals, which can suppress modal aliasing to a certain extent. Jia et al. [11] studied the RUL prediction of non-stationary and nonlinear vibration data by EEMD method combined with grey relational analysis. However, there is still residual noise in the signals decomposed by EEMD, which would interfere with the subsequent prediction. Variational mode decomposition (VMD) can avoid the phenomenon of mode aliasing and has good anti-noise ability. Liu et al. [12] proposed a novel RUL prediction approach for cutting tool in which the original vibration data are processed by VMD to get different components for feature extraction. However, VMD requires a huge effort to select the control parameters. Compared with EEMD and VMD, the reconstruction error of the decomposed signals can be reduced close to zero using CEEMDAN. Peng et al. [13] established a novel denoising model in which the original vibration signals are decomposed by CEEMDAN. Although CEEMDAN can effectively decompose non-smooth and nonlinear signals, the degradation features in IMF components obtained by CEEMDAN are still not obvious.
The fine-to-coarse reconstruction (FTC) [14] can reconstruct the IMF components obtained from each decomposed sample into the high-frequency IMF components and low-frequency IMF components. The high-frequency IMF components contain more degradation information but more noise, and the low-frequency IMF components contain less degradation information but less noise. The traditional FTC cannot reconstruct the IMF components with complex features and especially high-dimensional nonlinearity. PSR technology has been widely used to process one-dimensional nonlinear time series data [15], which can restore the hidden degradation information, greatly improving the RUL prediction accuracy. Liu et al. [16] investigated the RUL prediction of bearings by extracting sensitive features through PSR and feature matrix approximate diagonalization. Although FTC and PSR can improve the accuracy of RUL prediction to a certain extent, an effective RUL prediction mainly depends on building a good prediction model.
Recently more and more researchers have focused on the RUL prediction based on deep learning. Zhu et al. [17] put forward a RUL prediction method of bearings based on multiscale CNN, and the results show that the method gives a better performance of RUL prediction for bearings. However, CNN cannot effectively process the chaotic time series data with high noise. Compared with CNN, LSTM can capture the long-term dependence in time series data and maintain the continuity of degradation information to a certain extent. BiLSTM is an improved LSTM [18], which can extract features in parallel. Yu et al. [19] proposed a RUL prediction scheme using BiLSTM, which can effectively predict the RUL of various mechanical equipment. In order to overcome the problem that BiLSTM cannot completely extract the sensitive features of fragmented time series data, many researchers began to combine CNN and LSTM or BiLSTM for RUL prediction. Niu et al. [20] successfully implemented the RUL prediction of cutting tool on the original vibration signals through the combination of 1D-CNN with LSTM. Li et al. [21] constructed a RUL prediction model based on time window (TW), CNN, and LSTM (TW-CNN-LSTM) for turbofan engines. Xia et al. [22] took the advantages of multiple time windows (MTW), CNN, and BiLSTM to construct a RUL prediction model based on MTW-CNN-BiLSTM for turbofan engines. Multiple deep learning models are combined in a serial manner for RUL prediction in the above studies, and the experimental results verify its effectiveness. However, how to give full play to the ability of different deep learning models to achieve better prediction result still needs for further research.
In the actual CNC machining process, how to accurately predict the RUL of cutting tool is an urgent problem to be solved. Therefore, a novel remaining useful life prediction method based on CEEMDAN-IFTC-PSR and ensemble CNN/BiLSTM model for cutting tool is proposed, which can effectively predict the RUL of cutting tool.
The main contributions of this paper are as follows.
• A novel RUL prediction method based on CEEMDAN-IFTC-PSR and ensemble CNN/BiLSTM model for cutting tool is proposed to accurately predict the RUL of cutting tool.
• A signal processing method based on CEEMDAN-IFTC-PSR is proposed to effectively eliminate modal aliasing and better extract degradation features. CEEM-DAN is used to decompose cutting tool vibration signals to get IMF components, IFTC is adopted to obtain high-frequency IMF components and low-frequency IMF components from IMF components and respectively fuse them into high-frequency data and lowfrequency data, and PSR is used to reconstruct high-frequency data and low-frequency data.
• A RUL prediction model based on ensemble CNN/BiLSTM is built to obtain more stable and better prediction result. Multiple prediction branches are adopted to construct an ensemble RUL prediction model, and BiLSTM and CNN respectively use high-frequency data and low-frequency data to train a RUL prediction model in each prediction branch.
• A series of experiments are conducted to verify the effectiveness of the proposed RUL prediction method using the cutting tool vibration signals collected from the actual CNC machining process, and the results show that the proposed method obtains a high score of RUL prediction for cutting tool. The rest of this paper is organized as follows. The basic theory is introduced in Section II. The proposed RUL prediction method for cutting tool is discussed in VOLUME 10, 2022  Section III. The experimental results and analysis are given in Section IV. The conclusions and future work are presented in Section V.

II. BASIC THEORY A. OVERVIEW OF CNN
CNN is one of the most representative neural networks in deep learning. A typical 1D-CNN structure consists of an input layer, a convolution layer, a pooling layer, a fully connected (FC) layer, and an output layer, as shown in Fig. 1. The convolution layer takes the convolution kernel as the basic structural unit. In the convolution layer, multiple convolution kernels are used to convolve with the input data, and a series of deep features can be obtained using an activation function after adding bias. A pooling layer is added after the convolution layer, and the computational complexity of CNN is decreased by reducing the output feature map sizes of the convolution layer. The fully connected layer is used to fuse all the features, and each node in the fully connected layer is connected with all nodes in the previous layer.

B. OVERVIEW OF BiLSTM
When the traditional recurrent neural network (RNN) is used to analyze long time series data, the gradient explosion or gradient disappearance often occurs [23]. Compared with the traditional RNN, LSTM adds various adjustment gates and the memory unit, which can realize the effective transmission of information in the long time series data. Specifically, LSTM introduces three adjustment gates, namely input gate, forget gate, and output gate. In addition, LSTM also introduces a memory unit similar to the hidden state function in RNN, so as to record additional information. Fig. 2 presents the typical LSTM structure. In Fig. 2, x t represents the input data at the current moment, h t−1 and h t represent the output of the hidden layer at the previous moment and the current moment respectively, and C t−1 and C t represent the state of the memory unit at the previous moment and the current moment respectively.
In LSTM, only the past information of time series data are used. BiLSTM is used to solve this problem. As shown in Fig. 3, the typical BiLSTM structure consists of two LSTM layers with opposite directions. Specifically, the forward LSTM layer is used to extract the past information, and the backward LSTM layer is used to extract the future information.

III. PROPOSED RUL PREDICTION METHOD FOR CUTTING TOOL A. RUL PREDICTION PROCESS FOR CUTTING TOOL
The RUL prediction process for cutting tool is shown in Fig. 4, which is divided into four stages: data preprocessing, signal processing, model training and verification, and RUL prediction. The specific implementation steps are as follows.
Step 1: Preprocess the original cutting tool vibration signals. Firstly, the abnormal maximum values in the original vibration data are removed. Secondly, the data are divided into several samples, and each sample contains the data collected within one second. Finally, the RUL label is set for each sample, and the RUL labels are determined by time stamp, namely every sample is labeled with time stamp.
Step 2: Signal processing via CEEMDAN-IFTC-PSR. Firstly, each sample is decomposed into six IMF components by CEEMDAN. Secondly, all IMF components are reconstructed using IFTC to obtain the high-frequency data and low-frequency data. Finally, the high-frequency data and low-frequency data are reconstructed by PSR to obtain the sample subsets.
Step 3: RUL prediction model training and verification. Firstly, several sample subsets are randomly selected as the training set and the validation set. Secondly, the training set is used to train a RUL prediction model based on ensemble CNN/BiLSTM, and the validation set is used to optimize the model in each training epoch. After several training epochs, a well-trained RUL prediction model is obtained finally.
Step 4: RUL prediction. The test set is input into the RUL prediction model to obtain the RUL prediction result of cutting tool.  The specific process of preprocessing the original cutting tool vibration signals is as follows. Firstly, the PLC data are aligned with the vibration data in the sensor data, and the tool coordinates in the PLC data are combined to draw the tool paths of the three-dimensional machining process. According to the tool path, several abnormal CSV data files are deleted. Fig. 5 shows the tool paths of the three-dimensional machining process. Secondly, all vibration signal files of the CNC tool are merged to remove the maximum values in the dataset. Thirdly, the spindle load signals in the controller data are used to filter out the mute section in the dataset. Fourthly, the remainder data are divided into many samples, and the number of data points in each sample is 25600. Finally, the corresponding label file is generated according to all samples.

2) PROCESSING FLOW OF CEEMDAN-IFTC-PSR
In order to improve the accuracy of RUL prediction for cutting tool, a signal processing method based on CEEMDAN-IFTC-PSR is proposed, and the processing flow of CEEMDAN-IFTC-PSR is shown in Fig. 6. The proposed signal processing method is described as follows.
Step 1: Obtain IMF components by CEEMDAN. After preprocessing the original cutting tool vibration signals, m samples are obtained, and each sample is decomposed by CEEMDAN to produce six IMF components with more complete preservation of degradation information.
Step 2: Select the high-frequency and low-frequency IMF components using IFTC. The high-frequency IMF components and low-frequency IMF components are selected from m × 6 IMF components, and they are respectively put into the high-frequency IMF components pool and low-frequency IMF components pool.
Step 3: Obtain the high-frequency and low-frequency data using IFTC. The high-frequency IMF components are randomly selected from the high-frequency IMF components pool and they are fused into n high-frequency data. Similarly, VOLUME 10, 2022 the low-frequency IMF components are randomly selected from the low-frequency IMF components pool and they are fused into n low-frequency data.
Step 4: Reconstruct the high-frequency and low-frequency data by PSR. At first, assuming that the high-frequency data represents the embedding dimension. τ and m are calculated by the C-C method in PSR. Then, a chaotic feature vector can be constructed as Y = {d 2+(m−1)τ , d 3+(m−1)τ , . . . , d n , d n+1 } using the space vector Y, and the chaotic feature vector Y is the high-frequency data Z i,1 . Finally, the low-frequency data Y i,2 also can be reconstructed to Z i,2 .

3) THE IMPROVED FINE-TO-COARSE RECONSTRUCTION
When the mechanical characteristics of cutting tool change, the path of vibration energy transfer may change, which will lead to the changes of amplitude and frequency of cutting tool. The energy of the i-th IMF component E i is calculated by where J is the number of data points in an IMF component, x ij is the amplitude of the j-th data point in the i-th IMF component. The total energy of all the w IMF components E is calculated by The weight of the i-th IMF component p i is calculated by w IMF components are fused by where IMF i is the i-th IMF component to be fused and its corresponding weight is p i . In order to sufficiently extract high-dimensional and fragmented features from the original vibration signals of cutting tool, the improved fine-to-coarse reconstruction (IFTC) is proposed, as described in Algorithm 1, which includes the following steps.
Step 1: Select the high-frequency and low-frequency IMF components. At first, the mean value of each IMF component obtained from each of m decomposed samples is calculated. Then, the t-test is used to identify whether the mean value of each IMF component nears to zero. The IMF component whose the mean value nears to zero is regarded as the high-frequency IMF component, and the IMF component whose the mean value greatly deviates from zero is regarded as the low-frequency IMF component. Fig. 7 shows the mean value of each IMF component obtained from a decomposed Algorithm 1 The Improved Fine-to-Coarse Reconstruction Input: the number of samples m and the number of high-frequency data or low-frequency data n Output: the high-frequency data Y 1,1 , Y 2,1 , . . . , Y n,1 and the low-frequency data Y 1,2 , Y 2,2 , . . . , Y n,2 1: Calculate the mean value of each IMF component obtained from each of m decomposed samples; 2: Use t-test to identify whether the mean value of each IMF component nears to zero; 3: Put each IMF component whose the mean value nears to zero into the high-frequency IMF components pool; 4: Put each IMF component whose the mean value greatly deviates from zero into the low-frequency IMF components pool; 5: for i = 1 to n do 6: for j = 1 to 2 do 7: if j = 1 then 8: Select randomly two high-frequency IMF components from the high-frequency IMF components pool; 9: else 10: Select randomly two low-frequency IMF components from the low-frequency IMF components pool; 11: end if 12: for k = 1 to 2 do 13: Calculate the energy of the k-th selected IMF component E k by (1); 14: Calculate the weight of the k-th selected IMF component p k by (3); 15: end for 16: if j = 1 then 17: Fuse the two selected high-frequency IMF components into the high-frequency data Y i,1 by (4); 18: else 19: Fuse the two selected low-frequency IMF components into the low-frequency data Y i,2 by (4); 20: end if 21: end for 22: end for sample. It can be seen from Fig. 7 that IMF 1, IMF 2, and IMF 3 can be regarded as the high-frequency IMF components and IMF 4, IMF 5, and IMF 6 can be regarded as the low-frequency IMF components.
Step 2: Establish the high-frequency and low-frequency IMF components pools. All the selected high-frequency IMF components are put into the high-frequency IMF components pool, and all the selected low-frequency IMF components are put into the low-frequency IMF components pool.
Step 3: Obtain the high-frequency data. Firstly, two high-frequency IMF components are randomly selected from the high-frequency IMF components pool. Secondly, the weights of the two selected high-frequency IMF components are calculated. Finally, the two selected high-frequency IMF components are fused into the high-frequency data according to the weights.
Step 4: Obtain the low-frequency data. Firstly, two low-frequency IMF components are randomly selected from the low-frequency IMF components pool. Secondly, the weights of the two selected low-frequency IMF components are calculated. Finally, the two selected low-frequency IMF components are fused into the low-frequency data according to the weights.
Step 5: Repeat Step 3 and Step 4, until n high-frequency data and n low-frequency data are obtained.

C. PROPOSED RUL PREDICTION MODEL FOR CUTTING TOOL 1) RUL PREDICTION MODEL BASED ON ENSEMBLE CNN/BiLSTM
In order to obtain more stable and better RUL prediction result for cutting tool, a RUL prediction model based on ensemble CNN/BiLSTM is proposed. The main ideas of the proposed model are as follows: 1) multiple prediction branches are adopted to construct an ensemble RUL prediction model; 2) BiLSTM and CNN respectively use high-frequency data and low-frequency data to train a RUL prediction model in each prediction branch. Fig. 8 presents the RUL prediction model based on ensemble CNN/BiLSTM.
The specific implementation steps of the RUL prediction model based on ensemble CNN/BiLSTM are as follows.
Step 1: Obtain the training subsets and validation set. After the cutting tool vibration signals for model training and verification are processed via CEEMDAN-IFTC-PSR, several sample subsets are obtained and randomly divided into the training set and validation set according to the ratio of 8:2. The training set is further randomly divided into k training subsets. Note that the grid-search method is used to select the optimal value of k, and the evaluation index of the grid-search method is calculated by where Er i is the percentage error of the i-th cutting tool, RUL i and RÛ L i represent the actual RUL and predicted RUL of the i-th test set respectively.
Step 2: Construct an ensemble RUL prediction model. During each training epoch, in the i-th prediction branch, at first the high-frequency data and low-frequency data of the i-th training subset are input into BiLSTM and CNN to train a RUL prediction model respectively, and then the validation set is used to optimize the RUL prediction model based on BiLSTM and the RUL prediction model based on CNN, where 1 ≤ i ≤ k. When the maximum number of epochs is reached, the model training and verification will be terminated and the final ensemble RUL prediction model will be obtained.
Step 3: Fuse the RUL prediction results. At first, in each prediction branch, the RUL prediction result obtained with BiLSTM and that obtained with CNN are fused according to the specified weights to obtain a prediction result. Note that the RUL prediction result of the i-th prediction branch can be calculated by where RUL BiLSTM i and RUL CNN i represent the RUL prediction results of BiLSTM and CNN in the i-th prediction branch respectively, w 1 and w 2 are the weights of RUL BiLSTM i and RUL CNN i respectively, 0.3 ≤ w 1 ≤ 0.7, 0.3 ≤ w 2 ≤ 0.7, and w 1 + w 2 = 1. Note that the grid-search method is used to select the parameter w, and the evaluation index of the grid-search method is calculated by (5). Then, the mean value of k RUL prediction results is calculated as the final RUL prediction result. VOLUME 10, 2022

2) CNN STRUCTURE
The CNN structure designed for RUL prediction of cutting tool is shown in Fig. 9, which can be divided into the following three parts.
The first part mainly consists of four convolution blocks, which is responsible for extracting degradation information. The number of convolution blocks has an important impact on the performance of CNN. In [24], it is found that 1-D CNN with four convolution blocks has the best performance for the time series data. Therefore, four convolution blocks are adopted in the proposed CNN structure. Specifically, each convolution block contains one 1-D convolution layer and one batch normalization (BN) layer, the first and fourth convolution blocks respectively contain one 1-D max-pooling layer, and the ReLU activation function is adopted in each convolution block. The four convolution layers include 256, 128, 64, and 32 filters, respectively. The two pooling layers respectively perform the 3 × 1 maxpooling operation. The BN layer is used to solve the internal covariate drift problem, thus speeding up the convergence speed.
The second part consists one flatten layer, which is responsible for flattening the output of the last max-pooling layer into a one-dimensional vector.
The third part mainly consists of three fully connected layers, which is responsible for RUL prediction. The numbers of neurons of the three fully connected layers are set to 512, 256, and 1, respectively. The ReLU activation function is adopted in the first two fully connected layers, and the sigmoid activation function is adopted in the last fully connected layer. The dropout operation is performed after the first and second fully connected layers, which can avoid overfitting.
The detailed parameters setting of CNN structure designed for RUL prediction of cutting tool is listed in Table 1.

3) BiLSTM STRUCTURE
The BiLSTM structure designed for RUL prediction of cutting tool is shown in Fig. 10, mainly including the input layer, two BiLSTM layers, two fully connected layers, and the  regression output layer. The parameters of the two BiLSTM layers are initialized by Glorot uniform, and the parameters of the two fully connected layers are initialized by He uniform. The tanh activation function is adopted in the two BiLSTM layers, and the ReLU activation function is adopted in the two fully connected layers. In addition, the dropout layer is added after each BiLSTM layer and each fully connected layer, and the dropout rate is set to 0.5, which can alleviate the problem of over-fitting. The final RUL prediction result is obtained through the regression output layer with the sigmoid activation function.
The parameters setting of BiLSTM structure designed for RUL prediction of cutting tool is listed in Table 2.
The forget gate in LSTM can retain the important information in the time series and discard the secondary information. Theoretically, the more LSTM layers, the better the fitting of the RUL prediction model to the non-stationary and nonlinear time series data. However, too many LSTM layers may lead to over-fitting and spending a large amount of computational time. Therefore, the typical two-layer BiLSTM including the forward LSTM layer and the backward LSTM layer is adopted. The number of neurons of the LSTM layer has an important impact on the performance of BiLSTM, too few neurons cannot better extract the degradation information from the cutting tool vibration signals, thus the number of neurons of each LSTM in the first BiLSTM layer is set to 300. However, too many neurons will lead to a large number of parameters trained in the two fully connected layers, thus the number of neurons of each LSTM in the second BiLSTM layer is set to 150.

IV. EXPERIMENTAL RESULTS AND ANALYSIS A. EXPERIMENTAL SETUP
In order to verify the effectiveness of the proposed cutting tool RUL prediction method, the cutting tool dataset [25] provided by Foxconn is adopted. The dataset comes from the actual CNC machining process, which selects three brand-new tools for normal processing program and stops data collection until the end of cutting tool life. It includes the PLC signals from the control system and the vibration and current signals from the add-on vibration and current sensors. Fig. 11 shows the installation position and direction of a vibration sensor.
The structure of the experimental dataset is shown in Fig. 12. The dataset contains the PLC data and sensor data collected from three cutting tools. The feature fields of the PLC data are as follows: recording time, spindle load, X -axis coordinate, Y -axis coordinate, Z -axis coordinate, and corresponding file name. The feature fields of the sensor data are as follows: X -axis vibration signal, Y -axis vibration signal, Z -axis vibration signal, and current signal. The   sampling frequency of the PLC signals is 33 Hz, and the sampling frequency of the vibration signals is 25.6 kHz. In this paper, the vibration signals of tool A are used as the training set, the vibration signals of tool B are used as the validation set, and the vibration signals of tool C are used as the test set.
The working life of both tool A and tool B is 240 minutes, and the working life of tool C is 185 minutes. Tool A and tool B have 48 CSV data files each, and tool C has 37 CSV data files. Only 1-minute sensor data (i.e., X -axis vibration signals, Y -axis vibration signals, Z -axis vibration signals, and the current signals) out of every 5 minutes in each data file are provided due to the large amount of sensor data. When the CSV data file is labeled, the expected value of 5 minutes is used as the label of the whole CSV data file. For example, the RUL label corresponding to the last CSV data file of each tool should be 2.5 minutes. The RUL labels of the CSV data files of these three tools are listed in Table 3.  It is important to reasonably set the parameter values of the delay time τ and the embedding dimension m for PSR. The parameters τ and m of PSR are selected by C-C method [26] for the training set, validation set, and test set. The detailed parameters setting of PSR is listed in Table 4.
All experiments are conducted out on an 8-core Intel Core i7-9700K CPU at 3.6 GHz, an NVIDIA GeForce RTX 2070 SUPER GPU with 2560 CUDA cores, 64 GB RAM, and CentOS 8.1. The programs in this paper are implemented with Python 3.7.

B. EXPERIMENTAL RESULTS AND ANALYSIS 1) EVALUATION METRICS
The mean absolute error (MAE) and root mean square error (RMSE) are used to evaluate the performance of the proposed RUL prediction method, which are defined as follows: and where N is the total number of samples, R i and R T i represent the predicted RUL value and the corresponding actual RUL value of the i-th sample, respectively. In addition, the score function from the Prognostics and Health Management (PHM) Data Challenge [27] is also used to evaluate the performance of the proposed RUL prediction method, which is calculated by where S represents the total number of samples in the test set, and each predicted score is calculated in the percentile system. For example, when A i is 1, the score is 100. When A i is 0.51, the score is 51. A i is calculated by

2) COMPARISON WITH DIFFERENT SIGNAL PROCESSING METHODS
In order to analyze the impact of different signal processing methods on RUL prediction of cutting tool, three different RUL prediction methods are used to train RUL prediction models, respectively. The trained models are used to predict the RUL of cutting tool. As shown in Table 5, the MAE obtained by CEEMDAN-IFTC-PSR-CNN/BiLSTM is 17.97% and 5.24% lower than that obtained by EMD-IFTC-PSR-CNN/BiLSTM and EEMD-IFTC-PSR-CNN/BiLSTM, respectively. The RMSE obtained by CEEMDAN-IFTC-PSR-CNN/BiLSTM is 17.37% and 5.29% lower than that obtained by EMD-IFTC-PSR-CNN/BiLSTM and EEMD-IFTC-PSR-CNN/BiLSTM, respectively. The score obtained by CEEMDAN-IFTC-PSR-CNN/BiLSTM is 1.06× and 0.33× higher than that obtained by EMD-IFTC-PSR-CNN/BiLSTM and EEMD-IFTC-PSR-CNN/BiLSTM, respectively. This is because the reconstruction error of the decomposed signals can be reduced close to zero using CEEMDAN compared with EEMD and EMD. Therefore, the RUL prediction method using CEEMDAN can better adapt to the real-time cutting tool signals with time fragmentation and unclear degradation features.

3) COMPARISON WITH DIFFERENT RUL PREDICTION MODELS BASED ON DEEP LEARNING
In order to evaluate the performance of the proposed cutting tool RUL prediction model, the experiments are conducted with CNN [28], DCNN [29], LSTM [30], BiLSTM [31], and the proposed RUL prediction model. Note that these RUL prediction models use the same signal processing method. The network structure settings of different RUL prediction models are presented in Table 6.   Table 7 shows the performance comparison of five different RUL prediction models based on deep learning. It can be seen from Table 7 that the MAE obtained by the proposed RUL prediction model is 1068.74×, 296.78×, 313.97×, and 103.94× lower than that obtained by CNN, DCNN, LSTM, and BiLSTM, respectively. The RMSE obtained by the proposed RUL prediction model is 520.85×, 371.11×, 483.98×, and 137.91× lower than that obtained by CNN, DCNN, LSTM, and BiLSTM, respectively. It also can be seen from Table 7 that the score obtained by the proposed RUL prediction model is 3.50×, 3.12×, 2.62×, and 2.23× that obtained by CNN, DCNN, LSTM, and BiLSTM, respectively. This is because multi deep learning model using ensemble learning can effectively mine the feature information from cutting tool vibration signals with high fragmentation and chaos, thus the proposed RUL prediction model can achieve satisfactory prediction performance.

4) IMPACT OF DIFFERENT NUMBERS OF PREDICTION BRANCHES ON PREDICTION PERFORMANCE
In order to analyze the impact of different numbers of prediction branches on prediction performance of the proposed RUL prediction method, the comparative experiments are carried out with different numbers of prediction branches. Note that the numbers of prediction branches are 6, 8, 10, and 12, respectively.
As can be seen from Fig. 13, when the number of prediction branches increases from 6 to 10, the MAE and RMSE obtained by the proposed RUL prediction method gradually decrease, and the score obtained by the proposed RUL prediction method gradually increases. For example, when the number of prediction branches is 10, the MAE, RMSE, and score obtained by the proposed RUL prediction method are 0.00649, 0.00787, and 83.46, respectively. However, when the number of prediction branches is 12, the MAE and RMSE obtained by the proposed RUL prediction method are higher than that obtained by the proposed RUL prediction method with 10 prediction branches, and the score is 2.04% lower than that obtained by the proposed RUL prediction method with 10 prediction branches. Because the cutting tool vibration signals with chaos and randomness lead to the great differences in the prediction performance of different branches. When the number of prediction branches increases, the performance of a single branch will be limited, thus a RUL prediction model with poor prediction performance is obtained.

5) COMPARISON WITH DIFFERENT SIGNAL RECONSTRUCTION METHODS
In order to evaluate the effectiveness of the signal reconstruction method used in the proposed model, the comparative experiments are carried out with the RUL prediction methods based on different signal reconstruction method. As can be seen from Fig. 14, the MAE and RMSE obtained by the RUL prediction methods with signal reconstruction method are lower than that obtained by the RUL prediction methods without signal reconstruction method, the scores are higher than that obtained by the RUL prediction methods without signal reconstruction method. For example, the MAE and RMSE obtained by the RUL prediction methods with IFTC-PSR decrease by 94.34% and 93.66%, respectively, and the score increases by 73.66% compared with the RUL prediction methods without signal reconstruction method.
It also can be seen from Fig. 14 that the MAE and RMSE obtained by the RUL prediction methods with IFTC-PSR are lower than that obtained by the RUL prediction methods with IFTC or PSR, the score is higher than that obtained by the RUL prediction methods with IFTC or PSR. The score obtained by the RUL prediction methods with IFTC-PSR increases by 25.47% and 29.02%, respectively, compared with the RUL prediction methods with IFTC and the RUL prediction methods with PSR. This is because the IFTC can mine the local continuity and integrity features from IMF components, and the one-dimensional chaotic cutting tool original vibration signals can be mapped into the high-dimensional phase space using PSR, thus the RUL prediction method with IFTC-PSR achieves better prediction performance.

6) IMPACT OF DIFFERENT WEIGHTS ON THE PERFORMANCE OF THE ENSEMBLE CNN/BiLSTM MODEL
In the proposed cutting tool RUL prediction method, the final prediction results are obtained by CNN and BiLSTM. To explore whether the weights of CNN and BiLSTM have a impact on prediction performance of the proposed RUL prediction method, the experiments are carried out with different weight combinations of CNN and BiLSTM. Table 8 shows the prediction performance with different weights of CNN and BiLSTM. It can be seen from Table 8 that with the weight of CNN gradually increases and weight of BiLSTM gradually decreases, the MAE and RMSE obtained by the proposed RUL prediction method gradually decrease, and the score gradually increases. When the weight of CNN is 0.3 and the weight of BiLSTM is 0.7, the MAE of the proposed method obtained on is 91.56% lower than the average value of the MAE obtained by the other four weight combinations, the RMSE of the proposed method obtained on is 92.52% lower than the average value of the RMSE obtained by the other four weight combinations, and the score of the proposed method obtained on is 17.95% higher than the average value of the score obtained by the other four weight combinations. The results show that the proposed RUL prediction method can achieve good prediction performance when the weight of CNN is 0.7 and the weight of BiLSTM is 0.3. This is mainly because BiLSTM can learn bidirectional long-term dependencies from high-frequency data, which is conducive to improve the prediction performance.

7) IMPACT OF DIFFERENT RATIOS OF TRAINING SET TO VALIDATION SET ON THE PREDICTION PERFORMANCE
In order to analyze the prediction uncertainty of the proposed cutting tool RUL prediction model, a series of experiments  are carried out with different ratios of training set to validation set. The prediction performance obtained with different ratios of training set to validation set are shown in Table 9.
As can be seen from Table 9, the scores obtained with the ratios of 9:1, 8:2, 7:3, 6:4, 5:5, 4:6, and 3:7 are 83.52, 83.46, 81.54, 78.61, 75.86, 68.66, and 56.80, respectively. The results demonstrate that the ratio of training set to validation set has an impact on the RUL prediction performance of the proposed model. The scores obtained with the ratios of 9:1, 8:2, 7:3, 6:4, 5:5 reach over 75. This is mainly because the proposed RUL prediction method based on multiple deep learning models using ensemble learning and dropout operation can avoid over-fitting and enhance the generalization ability, and it can combine the weaker prediction branches and stronger prediction branches to obtain a better RUL prediction result.

8) COMPARISON WITH OTHER RUL PREDICTION METHODS
In order to further verify the effectiveness of the proposed RUL prediction method, it is compared with three traditional prediction methods SVR [32], ELM [33], and BPNN [34], and three latest RUL prediction methods VMD-CNN-LSTM [12], EDWT-CNN-LSTM [20], and MTW-CNN-BiLSTM [22]. The descriptions of different RUL prediction methods are presented in Table 10.
As shown in Table 11, the MAE obtained by the proposed RUL prediction method is 98.84%, 98.87%, 98.94%, 87.95%, 98.22%, and 59.90% lower than that obtained by SVR,    Fig. 15 shows the comparison of prediction results of different RUL prediction methods. It also can be seen from Fig. 15 that the proposed method has great prediction ability in the actual cutting tool dataset.
The above described experimental results demonstrate that the four RUL prediction methods based on multiple deep learning models (i.e., VMD-CNN-LSTM, EDWT-CNN-LSTM, MTW-CNN-BiLSTM, and the proposed CEEMDAN-IFTC-PSR-Ensemble CNN/BiLSTM) can better adapt to the increasingly complex working condition data than the three traditional RUL prediction methods (i.e., SVR, ELM, and BPNN), this is mainly because they can take full advantages of different deep learning models to more deeply dig the sensitive features and degradation information from the actual cutting tool dataset. The RUL prediction results of both VMD-CNN-LSTM and EDWT-CNN-LSTM without using the ensemble framework are not as stable and good as that of both MTW-CNN-BiLSTM and the proposed CEEMDAN-IFTC-PSR-Ensemble CNN/BiLSTM using the ensemble framework, this is due to there is no time continuity between different data files of the experimental dataset. Compared with MTW-CNN-BiLSTM, the proposed CEEMDAN-IFTC-PSR-Ensemble CNN/BiLSTM obtains better prediction results, the reason is that the proposed method can better adapt to the high chaos and discontinuity of the experimental dataset, so it can better extract degradation features.
In general, compared with these existing RUL prediction methods, the proposed method makes the following improvements. I) The CEEMDAN method is used to eliminate the reconstruction error of the decomposed cutting tool vibration signals, the IFTC is proposed to select high-frequency IMF components and low-frequency IMF components, and the sensitive features are extracted from the high dimensional phase space using PSR. II) A RUL prediction model based on ensemble CNN/BiLSTM is built to obtain more stable and better prediction results. III) The CEEMDAN-IFTC-PSR and ensemble CNN/BiLSTM model is proposed to accurately predict the RUL of cutting tool through the vibration signals collected from the actual CNC machining process.

V. CONCLUSION
In this paper, a novel remaining useful life prediction method based on CEEMDAN-IFTC-PSR and ensemble CNN/BiLSTM model for cutting tool is proposed, and it is evaluated with the cutting tool dataset comes from the actual CNC machining process. CEEMDAN is used to decompose original vibration signals to get six IMF components, which can eliminate modal aliasing, suppress noise interference, and VOLUME 10, 2022 reduce computational overhead. IFTC is adopted to obtain high-frequency IMF components and low-frequency IMF components from IMF components and respectively fuse them into high-frequency data and low-frequency data, and PSR is used to reconstruct high-frequency data and lowfrequency data, which can better extract degradation features. Ten prediction branches are adopted to construct an ensemble RUL prediction model for cutting tool, which can preserve the continuity and completeness of the degradation features from a sample set to obtain more stable prediction result. In each prediction branch, BiLSTM and CNN respectively use high-frequency data and low-frequency data to train a RUL prediction model, which can fully dig different frequency data using different deep learning models to obtain better prediction result. A lot of experiments are carried out to verify the effectiveness of the proposed RUL prediction method. The results show that the score of the proposed method is 8.78%, 17.38%, and 30.28% higher than that of MTW-CNN-BiLSTM, VMD-CNN-LSTM, and EDWT-CNN-LSTM, respectively, and the score of the proposed method reaches up to 83.46 points.
In the actual CNC machining process, a large number of cutting tool vibration signals are generated, the future work will explore how to set RUL labels reasonably to further improve the score of the proposed method. Facing a huge amount of cutting tool vibration data, a fast RUL prediction method based on distributed parallel computing platform will also be explored.