A Novel Hybrid Deep Neural Network Prediction Model for Shield Tunneling Machine Thrust

Shield thrust is a critical operational parameter during shield driving, which is of vital significance for adjusting operational parameters and ensuring efficient and safe propulsion of shield tunneling machine. In this paper, a novel hybrid prediction model (CLM) combining attention mechanism, convolutional neural networks (CNN) and Bi-directional long short-term memory (BiLSTM) network is proposed for shield thrust prediction. Correlation analysis based on Maximal Information Coefficient (MIC) between the thrust and other parameters is first conducted to select optimal parameters and reduce input dimension. An attention mechanism is introduced into CNN to distinguish the importance of different features, with the convolution layer and pooling layer further extracting dimension features of the data. Then, a BiLSTM neural network integrating first attention layer is employed to extract time-varying characteristics of the data, with a second attention layer added to capture important time information. Field data during shield cutting bridge piles are investigated to support and validate the effectiveness and superiority of the proposed method. Results show that the proposed CLM model are general enough to avoid overfitting problems and have good performance at prediction. The predicted value match reasonably well the monitoring data, with coefficient of determination ( $\text{R}^{2}$ ) equaling to 0.85, root mean square error (RMSE) equaling to 0.05, mean absolute error (MAE) equaling to 0.02. The CLM model in this paper can accurately predict the thrust even under complicated construction conditions, which provides reference for similar industrial application.


I. INTRODUCTION
With advancing tunnelling technique and booming demands for underground transportation, the subway mileage has increased dramatically worldwide in recent decades, and is expected to keep increasing in the future. Dense subway network and limited construction sites frequently have had subway tunnels cross existing underground structures, such as steel reinforced concrete piles supporting bridges or buildings [1], [2]. The concrete piles, if encountered, used to be pulled out from ground surface or cut manually through vertical shaft or from machine face. The manual pile removal is costly, risky and time consuming [3]. More and more projects The associate editor coordinating the review of this manuscript and approving it for publication was Sunil Karamchandani . start to cut through piles directly using shield machines equipped with strong disc or shell cutters.
During the cutting process, the change of the shield thrust is a result of complicated machine-ground-structure interaction which is a function of a variety of parameters, including machine parameters (e.g., torque, thrust, chamber pressure, cutter wear, and cutterhead temperature), ground properties (e.g., soil type, strength, and stiffness), structure properties (e.g., diameter, concrete strength, and steel arrangement) and so on [4], [5]. Shield thrust is important since its value directly affects the safety of superstructure and shield cutting efficiency. Therefore, precisely predicting the shield load can help engineers adjust operational parameters before shield cutting, which ensures the safety of superstructure and shield machine. The prediction method of the shield thrust mainly includes theoretical analysis, numerical simulation, and machine learning. Previous prediction methods for shield load are composed of two types: theoretical analysis methods and numerical simulation methods. For theoretical analysis methods, the Colorado School of Mines (CSM) was first proposed and widely used in the calculation of shield loads [6]. In order to study the impact of geological condition, Wang et al. [7] proposed a new thrust force model under the single geological condition based on an assumption that the shield excavation is very close to equilibrium. Considering the limitations of the single geological conditions, Zhang et al. [8] established a theoretical prediction model for shield loads based on the impact of soil-rock interbedded ground on shield loads, and the proposed model had been proved to be effective. Zhou et al. [9] established a theoretical torque prediction model under mixed geological conditions. In addition, the operational and structural parameters are also important factors that affect shield loads, Yagiz et al. [10] established theoretical prediction model to predict shield loads using polynomial regression under geological conditions and operational parameters. Zhang et al. [11] proposed a prediction model for shield loads through combined analysis of geological, operational and structural parameters. For numerical simulation methods, Han et al. [12] simulated the shield driving using three-dimensional finite element method, which can obtain cutterhead torque variation curve under different geological conditions. Wu and co-workers [13] established a three-dimensional model estimating the average value of cutterhead torque in a certain distance or time. Faramarzi et al. [14] used the discrete element method to estimate the TBM torque and thrust, and achieved higher prediction accuracy than theoretical analysis. These theoretical analysis and numerical simulation methods provide certain guidance for the shield driving. However, the prediction accuracy is affected by various factors, such as complex geological conditions and cutter wear, which are not considered in theoretical and numerical methods. Besides, the change of the thrust is a complex dynamic process, thus it can hardly be clearly figure out simply through theoretical and numerical analysis. Over the years, since the monitoring method of shield construction is becoming more and more diversified, such as automation equipment, fiber, and so on, there are more available data with large scales and high dimensions for shield load analysis. Hence, one burning issue is to choose appropriate method to analyze shield loads and extract data features.
In order to effectively explore the variation of shield load and precisely predict the value of shield thrust, this paper proposes a CNN-BiLSTM-Multiattention (CLM) method for shield thrust prediction. Due to convolutional neural networks (CNNs) perform well at capturing the dimension features of the data, bi-directional long short-term memory network (BiLSTM) can extract the time-varying characteristics of time series and attention mechanism can focus on important information to increase fitting abilities of the prediction models, CLM model in this paper can not only captures the time-varying characteristics of data, but also extracts the data dimensions features, and highlights the important dimension and key time information to improve the prediction accuracy.
The study is organized as follows: section 2 presents the related work about shield thrust prediction. Section 3 introduces the materials. Section 4 explains the proposed algorithm. Section 5 presents the preparing work, including data preprocessing, experimental environment, model establishment and metrics. Section 6 discusses the results. The conclusion is drawn in Section 7.

II. RELATED WORK
CNNs are similar to ordinary neural networks in that neurons are composed of learnable weights and bias constants. As shown in Figure 1, CNN is mainly composed of convolutional layers, pooling layers and fully connected layers. The input can be regarded as a grayscale image. Each convolutional layer is consisted of several convolutional units, and the parameters of each convolutional unit are optimized by the back-propagation algorithm. The purpose of the convolutional operation is to extract different features of input parameters. The first convolutional layer containing several filters is to obtain feature maps. Every filter is a weight matrix with local connections and shared weights, which can convolve an original image to a corresponding feature map that can be considered an image representation extracted by the filter. The first convolutional layer may only extract some lowlevel features such as edges, lines, and corners. More network layers can iteratively extract more complex features from the low-level features. A nonlinear layer (or activation layer) is usually applied immediately after each convolutional layer. The purpose is to introduce nonlinear features. Moreover, a pooling layer is inserted periodically between successive convolutional layers. It aims to gradually reduce the spatial size of the data volume, thus reducing the number of parameters in the network and avoiding over-fitting problem. The fully connected (dense) layer can be used to map the final output to linearly separable space. The input can be regarded as a grayscale image. The convolution formula is expressed as [15]: where x i,j represents the i row and j column of the input image, w m,n is the m row and n column of k × k weights matrix, w b is the filter bias, f is an activation function, and a i,j is the value of the i row and j column of the feature map.

2) LONG SHORT-TERM MEMORY NETWORK (LSTM)
The LSTM algorithm [16], [17], [18] is a variant structure of RNN, which can avoid the problem of gradient disappearance and explosion caused by long sequence training of RNN. LSTM is mainly composed of three parts: forget gate, input gate and output gate. The forget gate is mainly used to determine the retention and discarding of information in the cell, the input gate mainly determines the partial input and shielding of the information, and the output gate mainly determines the output information. The cell type is shown in Fig. 2. The candidate state is expressed as where x t is the input at the current moment; h t−1 is the output of the previous neural unit; W xc is the weighting of the input parameter x t and the memory unit; W hc is the weighting of the h t−1 and the memory unit; b c is bias vector. The input gate, forget gate and output gate are calculated as where c t−1 is the stored value at the previous moment. The output value of the LSTM unit is: where h t is the output value of the neural unit at the current moment.

3) ATTENTION MECHANISM
Attention mechanism is firstly applied in the human visual system [19], it can capture the important and ignore the unimportant from enormous information. Currently, attention mechanism is an important component of the neural network, which is widely used in areas of natural language processing, statistical learning and so on. Moreover, it has been widely used in the RNN and LSTM algorithms to solve time series tasks. Attention mechanism can extract important time information, then assign different weights to information at different moments. The calculation process of the Attention mechanism can be summarized into three stages: (1) calculating the similarity or correlation between Query and Key; (2) normalizing the original scores in the first stage; (3) weighting the summation of Value according to the weight coefficients. As shown in Figure 3. The calculation formula is expressed as: where Source is input parameters, {x 1 , x 2 , x 3 . . . x n }; Value i is the value of each element in Source; L x is the length of the input parameter; Query is an element of the output parameter; Key i can be regarded as the address of each element in Source.

B. INTELLIGENT PREDICTION FOR SHIELD LOAD
In addition to the theoretical analysis methods and numerical simulation methods, with the rapid development of machine learning techniques, the internal characteristics of a vast amount of monitoring data can be explored and fed back to engineering constructions. Deep learning as an important branch of machine learning, deep learning has been widely used in engineering to predict the shield load over the past years due to the ability to extract the law of data. Gao et al. [20] adopted three different recurrent neural network (RNN) models to predict TBM thrust and thrust in real-time based on in-situ operating data. Zhang et al. [21] found that LSTM model is better suitable for predicting the shield load than random forest (RF) model. Chen et al. [22] predicted torque and thrust based on an improved LSTM algorithm, and making it possible to adjust the TBM tunneling parameters in real time. Qin et al. [23] combined convolutional neural network (CNN) and long short-term memory (LSTM) to extract implicit features and sequential features for cutterhead torque prediction. Zhou et al. [24] proposed a multi-step shield load and attitude prediction method of shield tunneling machine based on WCNN-LSTM neural network. Shi et al. [25] proposed a novel hybrid multi-step prediction model for shield machine cutterhead torque. The model, combining variational mode decomposition (VMD), empirical wavelet transform (EWT) and long short-term memory (LSTM) network, can accurately predict cutterhead torque of shield tunneling machine in multiple time steps. Xu et al. [26] successfully predicted the shield thrust using five different statistical and ensemble machine learning methods. It can be seen that the deep learning methods have had many achievements on load prediction. However, their accuracy and practicability still need improvement. On the one hand, prediction methods above mainly study the time-varying characteristics of the data, and ignore the dimension features of the data. On the other hand, these existing predicting methods don't take it into account the influence of the different features and time information on the predicting results.

C. CONTRIBUTIONS
The contributions and innovations of this paper are concluded as follows: (1) In this work, we propose a novel hybrid model for precise shield thrust prediction. The proposed CLM model combines CNN algorithm, BiLSTM algorithm and attention mechanism. The operational, geological, structure and tunnel parameters are selected as input, and the output is the thrust at the next time.
(2) The CLM model integrates advantages of various algorithms. It not only captures dimensions features and the timevarying characteristics of time series, but also highlights important dimensions and key time information. The CLM model can avoid the overfitting problems on the training data set and has stronger generalization ability.
(3) Compared with existing prediction models, the proposed CLM prediction model has higher prediction accuracy and overcomes the shortcomings of traditional methods that cannot effectively learn the important dimension feature and key time information of shield thrust data.

III. MATERIALS
Using the Suzhou metro line No. 2 project as a testbed, this paper investigates shield machine cutting steel reinforced concrete piles. The diameter of the piles ranges from 1 m to 1.2 m, as shown in Fig.4 and 5. The subsurface strata consist of a layer of Miscellaneous Fill (Stratum x) underlain by natural soils which consist of Clay (Stratum y 1 ), Silty  Clay (Stratum y 2 ), Silt (Stratum y 3 ), Silty Clay (Stratum z 1 ), Sandy Silt (Stratum z 2 ) and Silty Clay (Stratum z 3 ). The natural soils are generally in the state of medium stiff or medium dense. The shield machine is expected to pass through the piles in Silty Clay (Stratum z 1 ) and Sandy Silt (Stratum z 2 ).
According to the previous studies [23], [25]. the data types usually include geological parameters, shield operating parameters and tunnel parameters during the shield driving. The structure parameters should also be taken into account because the structure parameters have directly impact on the shield load when shield machine needs to cut the structure. Therefore, the data types include geological parameters, shield operating parameters, structure parameters and tunnel parameters in Suzhou metro line No. 2 project. Operational parameters can be obtained from the monitoring center of the shield machine. Geological parameters, structure parameters and tunnel parameters can be obtained from engineering investigation report. The monitoring period was 10 s. 19000 datasets in total was sampled during shield machine cutting the first pile on the right line.

IV. THE PROPOSED MULTIATTENTION-CNN-BILSTM PREDICTION MODEL A. FRAMEWORK OVERVIEW
This paper proposed a hybrid model (CLM), combining CNN algorithm, BiLSTM algorithm and multi-attention mechanism to predict the thrust. Figure 6 shows that the CLM model consists of two modules, with the mixed domain attention introduced into the CNN algorithm and the attention mechanism integrated into BiLSTM algorithm.

B. MIXED DOMAIN ATTENTION-CNN COMPONENT
CNN algorithm can extract the dimension features of the data but fails to highlight important features while attention mechanism can capture important information. Therefore, CNN algorithm combined with attention mechanism can not only extract the dimension feature of the data, but also highlight important features.
As shown in the CNN module of the Fig. 6, mixed domain attention network is introduced into the CNN structure. Considering that the attention mechanism can distinguish the importance of data in different directions, it can assign weights to the channel domain and the spatial domain, then further extract data features to improve prediction accuracy. Detailed steps of the channel domain attention are as follows: first, matrix U is obtained after a convolution (F tr ) of the original matrix X'. then subjected to the maximum pooling operation (F sq (·)) to obtain compressed unit-length matrix (1 ×1×C), as shown in Eqn. 9.
where u c is the feature map of the cth channel, u c (i, j) refers to ith row unit of jth column of the feature map of the cth channel; H and W is the feature map height and width respectively. Z c is the scalar that is the output of the cth element. Second, the matrix is followed by two fully connected layers (F ex (·,W )) to obtain an updated matrix, as shown in Eqn. 10.
where σ is the tanh activation function, W 1 ∈ R C/r×C , W 2 ∈ R C×C/r represents weights in two fully connected layers. Z is the scalar that is obtained by the squeeze operator. σ (W 1 z) represents a fully connected layer is activated by an activation function. S c is the weight matrix. Similarly, S w and S h of the spatial attention domain also can be obtained. Then, two updated matrices in the spatial domain are fused by the tanh activation function, finally the above obtained matrix are fused with an updated matrix in the channel domain, as shown in Eqn. 11.
where S c , S w and S h are the weight matrices. Then updated matrices in the mixed domain are combined with the original matrix to obtain a new matrix X , as shown in Eqn. 12. Followed by convolution and pooling layers to extract the data feature. The fully connected layer converts data from graphs into vectors, serving as input of BiLSTM algorithm.

C. ATTENTION-BILSTM COMPONENT
The Bi-directional LSTM (BiLSTM) is a variant structure of LSTM, which can better capture the forward and backward changes in the time dimension by combining forward LSTM and backward LSTM. The output value of the BiLSTM unit is: where − → h i is the value of the forward LSTM, ← − h i is the value of the backward LSTM.
Some studies have shown that the prediction accuracy of the multi-layer LSTM structures is better than the single-layer structure [27], so the multi-layer BiLSTM structures are used in this algorithm to extract the time series characteristics of the data. Since − → h i and ← − h i contains different information of previous moments in the BiLSTM network, they have different effects on the y i . Weighting assignation in attention mechanism allows those important information presenting larger influence, Therefore, they are assigned with different weights. The y i is expressed as: where α, β are the weights of the forward and backward, respectively. Considering the influence of output y i at different times on final output, the attention layer is introduced to learn the weight of y i at the i moment automatically. Outputs of the algorithm are finally obtained. The Y i is expressed as: where α i is the weight at the i moment.

D. FLOWCHART OF THE PROPOSED MODEL
To study the prediction performance of the proposed model, this study mainly analyzes shield machine cutting the first reinforced concrete pile on the upper tunnel. The flow chart of the proposed model is depicted in Fig.7. First, different parameters are collected, and then input parameters dimensions will be selected using correlation analysis. Next, the optimal hyperparameters of the five prediction models can be obtained by training. Finally, the proposed model can be evaluated and verified through comparison with other four models and testing thrusts can be obtained.

V. EXPERIMENTS A. DATASET AND DATA PREPROCESSING
In order to facilitate the training of neural networks, we use the Max-min method to scale the data into the range [−1,1], as shown in Eqn. 16 where x is the raw data; x max and x min are the maximum and minimum of the raw data, respectively; and x is the value after normalization. Through trial-and-error [28], the mean square error (MSE) of the model is set as evaluation criteria, 80% (15200 sets) of the data are used as training subsets, the 15% (2850 sets) are used as validation subsets and the rest 5% (950sets) are used as testing data set, as shown in Fig. 8.

B. EXPERIMENTAL ENVIRONMENT
The hardware and software on the computer used in the experiment are listed on the Table 1.

C. MODEL ESTABLISHMENT AND HYPERPARAMETER
Shield thrust is affected by time and space [29], so the spatialtemporal matrix can be set as the input of the CNN neural network, as shown in Fig. 9. t n is the time of shield tunneling and S m is the shield driving distance, represented as H and W in Fig. 6, building up the spatial domain of the CNN structure. C k is the feature dimension of shield driving, which constitutes the channel domain of the CNN structure. The spatial-temporal matrix of the shield thrust can be expressed by Eqn. 17. The spatial-temporal matrix is the input layer of the CNN structure, with the dimensional features of the data extracted by convolution layer and pooling layer and the importance of the dimensional features extracted by the mixed domain attention mechanism. After a series of processing, such as convolution, pooling, and flattening, a onedimensional array is obtained. It is the input layer of LSTM model. The time series characteristics of the data can be extracted by BiLSTM model, and key time node information are extracted by the attention mechanism. Finally, the prediction results are obtained through the fully connected layer and the output layer. Table 2 presents the framework of the CLM model for thrust prediction: where a ij is the element of the matrix in i row and j column, it represents the shield thrust value when the shield driving at the i s and the distance is j mm. The prediction performance of the neural network is affected by many factors and hyperparameters of the neural network is an important factor. However, there is no mature theory for effectively selection of these hyperparameters [30]. The proposed algorithm is established by repeated experiments to determine the optimal hyperparameters, the parameters as shown in Table 3.

D. BASELINE MODEL
In order to verify the prediction accuracy of the proposed model, four widely used prediction models, namely, Random Forest (RF), Support Vector Machine (SVR), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) algorithms are introduced for comparison with the CLM algorithm. In each experiment, the model is trained on training subsets. The model performance is analyzed on the validation subsets to adjust the hyperparameters, and the predictive accuracy are finally verified on the test dataset.

E. METRICS
The prediction accuracy is measured by the error between predicted valueŷ and instrumented value y i . The RMSE (root mean square error), MAE (mean absolute error) and R 2 (determination coefficient) are used to evaluate the accuracy VOLUME 10, 2022     of regressions [31]. The evaluation indicators are as follows:

VI. RESULTS AND DISCUSSIONS A. PARAMETER SELECTION AND CORRELATION ANALYSIS
Many parameters can affect the shield thrust during the shield machine cutting through bridge piles. They are mainly divided into four kinds: geological parameters, shield operational parameters, tunnel parameters and structural parameters. Each kind of parameter also includes many subparameters. Therefore, the number of parameters is very large and it is unrealistic to choose all parameters as the input. This section will choose key parameters that can reflect thrust. Previous studies [32] indicated that the shield thrust is mainly affected by some operational parameters during the shield driving, such as cutter temperature, driving speed, shield attitude, penetration depth, grouting amount, grouting pressure and chamber earth pressure. In addition, some geological and tunnel parameters also can affect the shield thrust, such as buried depth ratio, friction angle and cohesion [33]. According to project characteristics, pile diameter, tensile strength of reinforcing bar and concrete strength selected as influencing factor. Based on the above analysis, 13 parameters are selected to predict thrust, Table 4 shows the statistical analyses on 13 parameters. Previous studies [34] have shown that the high dimension of the input parameters can cause overfitting in models and reduce computational efficiency. Therefore, this section will study the correlation between the input and output parameters using Maximal Information Coefficient (MIC) [35] to remove redundant input parameters. Figure 10 shows that the correlation coefficient (C) between cutter temperature, driving speed, shield attitude, penetration depth, grouting amount, grouting pressure, chamber earth pressure, buried depth ratio, friction angle, cohesion, pile diameter, tensile strength of reinforcing bar, concrete strength and thrust is VOLUME 10, 2022   [36] have shown that if C >0.9, it indicates that input and output parameters are perfect correlation; if 0.7<C<0.9, it indicates that input and output parameters are high correlation;if 0.4<C<0.7, it indicates that input and output parameters are low correlation; if C<0.4, it indicates that input and output parameters are approximate non-correlation. Therefore, grouting pressure, buried depth ratio, friction angle and cohesion are excluded, the remainder parameters were selected for thrust prediction.
The remainder parameters can be further reduced due to the redundancy between the selected input parameters that also can cause the overfitting in the machine learning. Fig.11 presents the correlation coefficient between the input features, it can be seen that the correlation coefficient (C) between cutter temperature and chamber earth pressure is 0.94, between shield attitude and chamber earth pressure is 0.98, between tensile strength of reinforcing bar and chamber earth pressure is 0.82, it indicating that there is perfect correlation between cutter temperature, shield attitude and chamber earth pressure, respectively. There is high correlation between tensile strength of reinforcing bar and chamber earth pressure. In order to ensure the independence of the  input parameters, we can abandon some parameters using the random forest-recursive feature elimination (RF-RFE) [37]. The raw data are randomly divided into a training set (80%) and a test set (20%), and then the RF-RFE was used for feature optimization. The RF-RFE selects different feature sets according to the importance of each feature, then calculates the accuracy of each feature set. Finally, the feature set with less features and high classification accuracy is regarded as the optimal feature set. As shown in Fig.12, cutter temperature, shield attitude and tensile strength of reinforcing bar can be excluded, the remainder 6 parameters were selected for thrust prediction.
where P 1 to P 13 represent buried depth ratio, friction angle, cohesion, grouting pressure, grouting amount, cutter temperature, pile diameter, tensile strength of reinforcing bar, concrete strength, chamber earth pressure, driving speed, shield attitude, penetration depth, respectively. Figure 13 presents loss values of the thrust on training subsets and validation subsets respectively. The results show that with the increase of epochs, the loss values rapidly decreased and tend to be stable, with the last loss values on training subsets and validation subsets almost the same (0.005). This indicates that the CLM model has a good prediction performance on  training dataset and can avoid the overfitting problems on the training data set. In addition, there is a few fluctuations on the loss curve on the validation subsets, which indicates that the CLM model has good generalization ability. Some fluctuations were observed in the loss curves on the training subsets due to the existence of noise in the raw data. The loss curves rapidly appeared to be stable after certain number of iterations, which demonstrates the reliability of the model.

C. COMPARISON WITH EXISTING METHODS
In this section, we compare the CLM algorithm with Random Forest (RF), Support Vector Machine (SVR), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) algorithms using the thrust data to demonstrate effectiveness of the CLM model. In order to ensure the rationality of the results, the optimal hyperparameters of the other algorithms are selected by training, and then the prediction results are compared with the LSTM algorithm. The optimal parameters of the other algorithms are shown in the Table 5.
As shown in Table 6, the MAE, RMSE and R2 in the datasets are 0.02, 0.05 and 0.85 for the CLM algorithm, 7.53, 6.62 and 0.61 for LSTM, 9.98, 11.63 and 0.50 for RNN,9.21,10.68 and 0.52 for SVR, 10.21, 13.44 and 0.49 for RF, respectively. Hence, the MAE and RMSE of the CLM algorithm are lower than other algorithms, the R2 of the CLM algorithm are higher than other algorithms. The results show that the prediction accuracy of the CLM, RNN and LSTM algorithms are higher than that of the other two machine learning algorithms. The reason is that the CLM, RNN and LSTM algorithms can effectively extract the time series features from the data, and can effectively learn the changing law of the time-varying bridge structure sequence. Prediction accuracy of the CLM, LSTM algorithm is higher than that of the RNN algorithm, this is because the RNN algorithm cannot learn the long-term dependencies for time series data due to the gradient disappearance and gradient explosion. Prediction accuracy of the CLM algorithm is higher than that of the LSTM algorithm, this is because the LSTM algorithm cannot learn the important time node information for time series data.
In addition, the prediction accuracy of SVR is higher than RF, the reason is that SVR is suitable for processing continuous data, while RF are suitable for processing discrete data. The same conclusions also can be drawn from Fig.14, it presents the prediction and measurement curves of the five prediction models in the testing data set. Compared with the RF, SVR, RNN and LSTM, the coincidence degree of the proposed CLM model is higher than other models.
As shown in Table 7 We all known that deep learning algorithms require more time for training than non-deep learning algorithms. Hence, it can be seen that the time of the CLM, LSTM and RNN algorithms are longer than RF and SVR algorithms. The running time of the CLM algorithm are longer than LSTM and RNN algorithms, the reason is that the proposed algorithm needs the addition of CNN structure to extract the data characteristic. Despite the longest running time, the CLM algorithm gives the best prediction results. In fact, the testing time is more important than training time, because the model can be trained in advance and the testing time directly reflect how long the prediction results can be applied to the actual prediction. The testing time of the CLM algorithm is 89.64s, which is acceptable. Moreover, the running time of the proposed model will be further reduced with the development of the computer hardware performance in the future.

D. ABLATION EXPERIMENT
The CLM model is mainly composed of two components: CNN algorithm fused with attention mechanism and BLSTM algorithm fused with attention mechanism. In order to further analyze the validation of each module, ablation experiments are carried out. The algorithms are CNN, BiLSTM, CNN + attention, BiLSTM + attention and CLM.
The testing results are shown in Table 8, the MAE and RMSE of the CLM algorithm are lower than other algorithms,    the R 2 of the CLM algorithm is higher than other algorithms. The results show that the prediction accuracy of the CLM algorithm is higher than that of the other four machine learning algorithms. In order to intuitively analyze the prediction accuracy of the proposed model, a comparison chart is drawn, as shown in Figure 15. MAE and RMSE through CLM model are smaller than that through CNN-Attention model by 99.3% and 99.4%, respectively. R2 through CLM model is larger than that through CNN-Attention model by 26.9%. The reason is that the BiLSTM model can recognize the long-short term dependency for time series data and attention mechanism in BiLSTM model can catch the key time node information. MAE and RMSE through CLM model are smaller than that through BiLSTM-Attention model by 33.3% and 58.3%, respectively. R2 through CLM model is larger than that through BiLSTM-Attention model by 10.4%. The reason is that the CNN model can deal with multi-dimensional problem with better accuracy and efficiency, and the attention mechanism in CNN model can catch important features. MAE and RMSE of the BiLSTM-Attention model are smaller than BiLSTM model. R 2 of the BiLSTM-Attention model is larger than BiLSTM model. The same conclusion can be drawn from CNN-Attention and CNN models. It indicates that attention mechanism is beneficial to improve the prediction accuracy.
The prediction effect of the model is intuitively evaluated by comparing prediction data and monitoring data, as shown  in Fig. 16. It can be found that the predicted value of proposed model is more consistent with the monitoring data, while the prediction values of CNN model is quite different from the monitoring data. The prediction performance of the BiLSTM model is better than CNN model, which indicates that BiLSTM model can better deal with a large volume of multi-dimensional and multivariate time series data.

E. MULTI-STEP PREDICTION
Currently, the multi-step prediction of time series is more valuable than the single-step prediction in some engineering fields. Engineering construction is urgent and single-step prediction cannot meet warning requirements. Therefore, multi-step prediction is more valuable for better construction measures. In this section, the multi-step prediction performance of the proposed model is evaluated. The thrust of next five time-steps were predicted and the CLM model compared with RF, SVR, RNN and LSTM models. MAE and RMSE are used as evaluation index. Figure 17 shows that the MAE and RMSE of all models are larger with bigger time steps, which indicates that the prediction performance of the models is getting worse and worse with bigger time steps. The reason is that the prediction results depend on the prediction of the last step and the prediction errors will be accumulated. However, compared with the SVR, RF, RNN and LSTM, the MAE and RMSE of CLM model are minimal. The results show that the prediction VOLUME 10, 2022 performance of CLM model is the best among all models. Moreover, it can be observed that the curve of the MAE and RMSE in CLM model begin to increase after the two timesteps, it indicates that the prediction performance of the CLM model rapidly drops after two time-steps. Therefore, how to improve the prediction performance of the CLM model after two time-steps is what to study in following study.

VII. CONCLUSION
In this paper, we propose a novel hybrid deep neural network for precise thrust prediction of shield tunneling machine. Correlation analysis based on Maximal Information Coefficient (MIC) between parameters and the thrust are conducted for parameter selection and input dimension reduction, then the dimensions features of shield thrust are extracted by CNN layer and important features are captured by mix domain attention mechanism. The time-varying characteristics are extracted by BiLSTM layer, and the attention mechanism can capture the important time node information. In order to validate the effectiveness, generalization and superiority of the proposed method, some experiments have been conducted based on real project data, and comparisons are made with existing machine learning models.
The results show that parameter selection is important in thrust prediction with construction data since each feature contributes differently and the dimensions of the input parameters can cause overfitting in models and reduce computational efficiency. The proposed model has a good performance, which can avoid overfitting problems and has good generalization ability. Compared with four existing prediction models, the proposed method shows higher prediction performance in terms of determination coefficient (R2), root mean square error (RMSE) and mean absolute error (MAE). Moreover, the validation of each component of the proposed model were analyzed using ablation experiment. The results demonstrate that the proposed CLM model integrates the advantages of CNN, BiLSTM and attention mechanism, it indicates that the proposed CLM model can accurately predict shield thrust during shield driving.
It can be found that although the multi-step prediction performance of the proposed CLM model is better than existing four models, prediction performance of the CLM model rapidly drops after two time-steps. Therefore, in the future, efforts will be made to on improving the multi-step prediction performance of the CLM model. Meanwhile, the proposed model also can be applied in other fields.

DECLARATION OF COMPETING INTEREST
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.