Landslide Risk Prediction Model Using an Attention-Based Temporal Convolutional Network Connected to a Recurrent Neural Network

Landslide risk assessment is an important component of the landslide research field. For the problem of landslide assessment indicators, we utilize the TOPSIS-Entropy method to assess the risk situation of landslide occurrences, which is easy to obtain directly from sensor data. By using the TOPSIS-Entropy method in landslide datasets, the instability margins of landslide risk are obtained, reflecting the current instability probability of the landslide body. For the landslide prediction issue, deep neural networks are used to predict the corresponding landslide instability margins (LIMs). Attention mechanism-based (Attn) temporal convolutional networks (TCN) connected with recurrent neural network (RNN) models for landslide risk prediction are proposed, including TCN-Attn-RNN and RNN-Attn-TCN, which both use an encoder–decoder architecture. The encoder in the first model uses the temporal convolutional network (TCN), and the decoder uses a neural network with an RNN architecture, including long short-term memory (LSTM) networks, gated recurrent units (GRUs), and their derivative algorithms. In the second model, the encoder uses a neural network with an RNN architecture, and the decoder uses a TCN. Combining the TOPSIS-Entropy method with TCN-Attn-RNN and RNN-Attn-TCN, reliable prediction models of landslide risk are proposed. By building a landslide simulation platform, we obtained landslide data. Compared to their counterparts, the proposed prediction models of landslide risk instability margins have better predictive effects.


I. INTRODUCTION
Landslide risk assessment is a critical technical tool for forecasting the occurrence of deadly landslides. However, a reliable model for predicting landslide risk still faces challenging issues. First, the identification and modeling of landslide hazard triggering factors are important technical difficulties. Second, landslide early warning is hampered by ambiguity in the temporal and geographical forecasting of landslide risk. Third, the quantitative evaluation of the landslide hazard triggering factors cannot be applied to actual sites and still The associate editor coordinating the review of this manuscript and approving it for publication was Sajid Ali . requires in-depth study. The goal in the landslide research field is to identify suitable assessment metrics and build accurate models to predict landslide risk.
Multisensor analysis still remains difficult. Aiming to address the issues of communication in the Internet of Things (IoT), Joshi et al. presented a useful approach using edge computing to process and analyze landslide data [1]. Li et al. presented a quantitative model of landslide probability composed of a rainfall intensity-duration threshold model, and they were successful in predicting landslide probability by utilizing geographic information system technology (GIS) [2]. Remondo et al. presented a procedure for landslide risk assessment that utilized spatial data to acquire a landslide VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ susceptibility model, and quantitative hazard models were obtained by past landslide frequency and magnitude [3]. Aiming at the rapid assessment of landslide risk levels, Tan et al. proposed a stacked autoencoder algorithm to handle the problems of current assessment methods, including timeconsumption, and this algorithm saves considerable time in landslide risk assessment and shows better performance than artificial neural networks (ANNs) and radial basis function neural networks (RBFs) [4]. In the field of landslide susceptibility index, Goyes-Peñafie and Hernandez-Rojas presented an integration of discrete and continuous data in landslide analysis, and employed logistic regression (LR) and weights of evidence (WoEs), showing highly accurate landslide susceptibility [5]. For landslide sensor data reconstruction, Utomo et al. analyzed the situation of abnormal missing landslide data. For example, sensor data failure, external interference, or other environmental factors may be lost, and they predicted the missing data value through a long short-term memory neural network (LSTM), which shows great performance, even in the case of 90% data loss [6].
In displacement prediction, Liu et al. explored algorithms for the prediction of landslide displacements, and the results showed that LSTM and gated recurrent units (GRUs) perform with encouraging results [7]. Rainfall has a significant influence on the incidence of landslides. Srivastava et al. explored how the back propagation neural network (BPNN), support vector regression (SVR), and LSTM are used to forecast rainfall and predict the occurrence of landslides based on rainfall thresholds [8]. Khaing and Thein used a deep learning system in conjunction with the IoT to successfully forecast rainfallinduced landslides. This system forecasts a univariate time series using the baseline method, the LSTM model [9]. Landslide susceptibility mapping reliability improvement can also be achieved by a machine learning ensemble model. Di Napoli et al. selected and assessed 13 predisposing factors and adopted a new approach based on an ANN, a generalized boosting model, and maximum entropy algorithms. The results showed that the proposed algorithm performed well in terms of validation scores [10]. Dou et al. also modified the landslide assessment approach using ensemble methods and used a support vector machine (SVM) with a bagging, boosting, and stacking framework [11]. Liang et al. performed research to assess and compare four alternative landslide susceptibility mapping models, and the gradient boosting decision tree (GBDT) model performed best in the landslide prediction capability. Combined with cluster analysis, the sampling strategy of non-landslide points can be improved [12]. Ma et al. summarized the machine learning methods for landslide prevention. Three main directions: image-based landslide detection, landslide sensitivity assessment, and landslide early warning system development, were the important elements of their investigation. They also discussed the challenges and potential opportunities for machine learning in the area of landslide prevention. They combined a data-driven approach with a knowledge-driven slippery slope mechanism to explain the results of machine learning in landslide prevention research [13]. Research on predicting displacement in the event of a landslide is also progressing rapidly. Liao et al. proposed an improved extreme learning machine with grey wolf optimization (GWO). The cumulative landslide displacement was decomposed into trend displacement and periodic displacement. A cubic polynomial model was then used to predict the trend displacement, and after statistical analysis of the displacement data, the model was used to predict the periodic displacement [14]. Ensemble models can also be used in landslide displacement prediction. A new data-driven monitoring and forecasting approach was proposed by Li et al., and they developed an autoregressive motion-averaged time series model to analyze the autocorrelation of landslide triggers. In addition, a parametric correlation model was fitted to the predicted displacements, and the link between the trigger variables and the landslide displacement values was explored [15].
Numerous ANN approaches to predict landslide displacements do not capture the potential nonstationary characteristics of landslide displacements. A discrete wavelet transform (DWT)-extreme learning machine (ELM) model based on chaos theory was proposed by Huang et al. to predict landslide displacement. Using the cumulative displacement time series of landslides as the dataset, the displacement of a landslide was estimated precisely [16]. Displacement occurs under the complex conditions of various impact factors, such as geological conditions and precipitation. By examining the response relationships among landslide deformation, rainfall, reservoir water level, and groundwater level, an ELM was proposed and a landslide displacement prediction model related to control factors was established by Cao et al. [17]. In 2014, Cheng Lian et al. proposed an integrated learning paradigm based on the EEMD-ELM model (extreme learning machine integrated with empirical mode decomposition) to address monitoring data for landslide displacement prediction [18]. In 2015, the aim of the BPNN was to predict slope deformation using daily and antecedent rainfall as input variables, and the model had great performance in accuracy [19].
The above scholars have shown that the application of machine learning and deep learning in landslide data analysis is effective, but they still face three major problems. First, the identification and modeling of landslide risk impact elements is an important technical challenge. Second, the uncertainty of the geographical and temporal prediction of landslide risk has an impact on landslide early warning. Third, the quantitative assessment of landslide risk impact elements still requires in-depth research. The focus in the landslide research field is to find suitable assessment indicators and build suitable models to predict landslides.
Recent breakthroughs in time-series forecasting have influenced the applied field. Data-driven models, such as LSTM, can be used to solve prediction problems in nonlinear time series and possess unique advantages [20]. Landslide processes include time-series data, such as rainfall, surface displacement, shallow moisture content, and deep moisture content. We use an encoder-decoder architecture to connect attention (Attn), temporal convolutional networks (TCNs), and recurrent neural networks (RNNs), and propose TCN-Attn-RNN and RNN-Attn-TCN landslide risk prediction models.
The contributions of this paper are shown as follows: First, we use the TOPSIS-Entropy method to access landslide risk, and the instability margin of landslide risk is obtained, which is more practical to obtain directly from sensor data than the stability and safety factors. Second, the TCN-Attn-RNN and RNN-Attn-TCN are proposed to solve the prediction issue of instability margins. The encoder-decoder architecture connects the TCN and RNN, and the attention mechanism plays the role of regulating the weight of intermediate vectors. Third, this paper integrates the assessment method of landslide instability margins (LIMs) and the landslide risk prediction methods, and solves the modeling problem from the sensor acquisition of landslide hazard information to landslide risk prediction.
Section II introduces the structure of the landslide simulation platform. In section III, the TOPSIS-entropy method and deep learning models are discussed. The TCN-Attn-RNN and RNN-Attn-TCN models for landslide risk prediction are proposed. Section IV demonstrates the effectiveness of the proposed models. Finally, the conclusions and discussion of landslide risk models are presented in section V.

II. LANDSLIDE SIMULATION PLATFORM
The landslide platform is constructed to simulate a real landslide occurrence, as shown in Figure 1. Architecture of the landslide simulation platform, including a simulated rainfall system, sensor measurement system, and a data collection system. The simulated rainfall system comprises the following: a water pump (WP), simulated rainfall sprinkler head (SRSH), test soil-carring box (TSCB), and hydraulic lifting system (HF). Rainfall measurement (RM), displacement measurement (DM), shallow soil moisture content measurement (SSMCM), and deep soil moisture content measurement (DSMCM) are all part of the sensor measurement system (DSCM). The host computer display (HCD) and database storage (DS) are two components of the data collection system.
Throughout the simulation of the landslide, we used sensors to monitor the landslide process factors, including rainfall, shallow soil moisture content, deep soil moisture content, and surface displacement. The magnitude of rainfall is an important factor affecting rainfall-induced landslides, and it also directly affects the magnitude of soil water content and indirectly affects soil stress and surface displacement. The landslide simulation platform uses heavy rainfall to simulate the landslide process, and the experiment lasts for approximately several hours, focusing on the simulation of landslides occurring in the case of 30 degree of slopes during intense rainfall. Every second, experimental data are collected, and approximately tens of thousands of datasets are used for landslide data modeling.  Figure 2 shows a physical diagram of the test soil-carrying box at the disaster test site. The collection sensors mainly include a tipping bucket type rain sensor for measuring rainfall, a tensile displacement sensor for measuring surface displacement, and soil moisture sensors for measuring deep moisture content and shallow moisture content, as shown in Figure 3.

III. METHODOLOGY
In this section, the TOPSIS-Entropy method used to access the landslide risk instability margin is introduced first. Then, canonical sequence deep learning models, such as TCN, LSTM, and GRU models, are formulated. In addition, our proposed models for landslide risk prediction are presented, including TCN-Attn-RNN and RNN-Attn-TCN.

A. TOPSIS-ENTROPY METHOD
The technique for order preference by similarity to the ideal solution (TOPSIS) was presented in 1981 and is also known as the superior-inferior solution distance method [21]. The TOPSIS-Entropy approach is an evaluation model constructed using the TOPSIS method in which the entropy weight method is used for weight assignment [22]. To obtain the instability margin, the data correlation among landslide influencing factors is determined using the TOPSIS-Entropy method, and the output of the TOPSIS-Entropy approach is utilized as the landslide instability margin, which effectively utilizes the data from the landslide monitoring sensors and makes it easier to calculate the landslide instability coefficient compared to the traditional method.
The landslide instability margins (LIMs) are calculated using (1)- (6), and the calculation process is shown in Figure 3. The first step is to set the four landslide influencing factors as the initialization matrix and then normalize them. Second, the weights are assigned to different landslide factors. The weights are assigned according to the magnitude of information entropy, which is calculated as follows. The very small, intermediate, and interval indicators in the matrix are normalized to very large indicators.
1. Calculate the magnitude of the weight of the jth moment of the ith observation. Here x ij denotes the observation, z ij denotes the normalized data, and p ij denotes the weight size.
2. Calculate the entropy value e j of the jth item.
3. Then, calculate the coefficient of variation g j ; the larger the variation is, the smaller the entropy value.
4. Calculate the weight of the jth term.
5. Calculate the comprehensive evaluation value S i of the ith evaluation object.
Finally, the observed values of each time period are multiplied by the corresponding weights to obtain the landslide instability margin, which characterizes the probability of landslide occurrence. The entropy weighting method is an objective weighting method, where the smaller the degree of data variation is (i.e., the smaller the variance is), the less information the data contain and the lower the weight. The landslide instability margin reflects the magnitude of the probability of landslides on a mountain. The TOPSIS-Entropy method scores directly reflect the probability of a landslide. Block diagram for solving the landslide instability margin is based on the TOPSIS-Entropy method. Source data first go through the data preprocessing, including landslide impact factors (LIF) detemination and the data normalization process (DNP), and then undergo the TOPSIS calculation process, which comprises the calculation of the weight (W) and the entropy value (EV), as well as the degree of variation (DV) of the data. Finally, the socre and output landslide instability margins (LIMs) are computed, and the data are weighted.

B. DEEP LEARNING MODELS 1) SEQUENCE MODEL
Sequence prediction by RNNs dominates the research on time-series prediction in deep learning due to its ability to capture long-term-dependent memory, which stores important information from the past and is used to forecast sequences at future moments. Framework models of RNNs, such as LSTM [23], GRUs [24] and their derivative algorithms, have shown superior performance in sequence prediction. In recent years, a dual-stage attention-based RNN model (DA-RNN) for nonlinear autoregressive exogenous scenarios has been proposed, and it captures long-term dependencies very well [25]. Canonical convolutional neural networks are generally less suitable for modeling time-series problems because their convolutional kernel size limits their ability to capture long-term-dependent information.
Bai et al. successfully used convolutional neural networks in sequence prediction modeling in 2018 [26], proposing an algorithm for temporal convolutional networks (TCNs) using a time-constrained model with causal convolution to capture longer dependencies, and an inflated convolution algorithm that allows for a larger sensory field, and constructing a residual connection, replacing one layer of convolution with a block of residuals. Compared with that of an RNN, the model structure of a TCN is simpler and more effective. Many scholars have extended TCNs to multivariate time-series prediction [27], [28]. Note that while TCNs solve some of the problems associated with gradient disappearance and gradient explosion in RNNs, they fall short in capturing long-termdependent information compared to the RNN framework and transformer, which can capture temporal information of arbitrary length. Some researchers have combined CNN and RNN structures and proposed the convolutional LSTM (ConvLSTM) algorithm [29].
The attention mechanism is actually the computation of a matrix of automatically weighted coefficients corresponding to the sequence. The query and key-value pairs are mapped to the output, where query, key, and value are vectors and the output is weighted by the values [30]. Many researchers have explored attention methods for time-series forecasting, such as transformer time series [31], and Informer [32]. The attention model can be divided into two categories: hard attention and soft-attention. Hard-attention is a 0-1 problem, where a region is either attended to or not attended to. Soft attention is a continuous distribution problem of [0,1], with different scores ranging from 0 to 1 reflecting the degree of attention of each region. Soft attention is a continuous distribution problem of [0,1], that uses different scores ranging from 0 to 1, to reflect the degree of attention of each region. In essence, the attention mechanism filters out important information from a large amount of information and gathers this vital information while ignoring the unimportant information [33]. The focusing process is reflected in the attention weight coefficient, and the larger the weight coefficient is, the more important the information.
The existence of two effective attention processes demonstrates DA-RNN's superior performance. The attention mechanism is added to the input stage to focus the weights on the fundamental aspects. Before LSTM decoding provides the final output, which is supplied to extract information about the time series characteristics, the temporal attention mechanism is applied to the intermediate hidden layer output. Our research, inspired by the DA-RNN model, proposes combining the parallelism and flexibility of convolution and the timing scalability of RNNs. By combining the features of the TCN structure and RNN structure and adding the attention mechanism, we propose TCN-Attn-RNN and RNN-Attn-TCN, and we use them in landslide risk prediction to construct a novel deep learning landslide risk prediction model.
Combining an RNN with a TCN and retaining the input attention mechanism and temporal attention mechanism, namely RNN-Attn-TCN and TCN-Attn-RNN, can effectively combine the advantages of RNNs and TCNs, and more effectively predict the situation of complex sequences. The application of a sequence prediction algorithm in complex landslide time-series data prediction is an innovation of our design. Shaojie Bai et al. made improvements to the basic TCN structure, such as residual connectivity and regularization [26]. The residual block is shown in Figure 5. The original one-dimensional causal convolutional layer is replaced by a residual block with 2 layers having the same expansion privacy and residual connectivity. The output of these two convolutional layers is added to the input of the residual block and fed into the next residual block. To adjust the width of the residual tensor, a 1 × 1 convolution is added. At this time, the width of the receptive field of the TCN is twice that of the original basic causal layer. Therefore, the receptive field size r can be obtained by (7).
where k denotes the size of the convolution kernel, b denotes the dilation base, and k ≥ b. The number of residual blocks n is related to the length l of the input tensor and is calculated as shown in (8). The 1×1 convolution maintains the same length between the input and output for the residual block [34] and dilated causal convolution guarantees that the output will not be influenced by future information [27]. LSTM is a tpye of sequence model with an RNN architecture, and the core feature is the inclusion of gating structures such as forget gates, which enables information to be passed on, as shown in Figure 6. Even knowledge from a previous time period may be remembered at a later point in time, enabling long-term memory.
The gate equation for LSTM is shown in (9)-(11), where the input is the cell state C t−1 ,the hidden state h t−1 at the previous moment, the input x t at moment t, and the output is the cell state C t at moment t and the hidden state h t at moment t. The forget gate decides which of the previous moment's information must be retained, the input gate determines which of the current input information must be retained, and the output gate determines the concealed state of the following moment.
The function of the forget gate f t is to determine what useful information to keep, which is determined by the sigmoid function, a sigmoid funciton close to 0 means more information is discarded, while a sigmoid function close to 1 means more information is kept. The input gate is used to update the state of the hidden layer by passing the state of the hidden layer at the previous moment and the current input information into the sigmoid. Then, the hidden layer information from the previous layer and the current input information are passed through the tanh function and multiplied by the sigmoid output value to output a candidate value, C t . The cell state is updated as shown in (12). The cell state of the previous layer is dot multiplied with the oblivion gate output, and then it is dot multiplied with C t , which updates the cell state C t .
GRUs are tpyes of RNNs of the RNN architecture, that contains only two elements: the update gate and the reset gate. Their structure is shown in Figure 6. The input is the hidden layer state h t−1 at the previous momenta and the input x t at moment t, and the output is the hidden layer state h t at moment t.
where r t denotes the reset gate at moment t, z t denotes the update gate at moment t, and h t denotes the hidden layer state at moment t. tanh and σ denote the activation functions. σ denotes the sigmoid function, which has an output between 0 and 1, and tanh is the hyperbolic tangent function, which has an output between −1 and 1. The reset gate r t is computed using (15), which determines the magnitude of forgetting the previous moment's information, which updates the input and the previous moment's hidden layer state. The update gate z t using (16) is similar to merging the forget gate and the input gate in LSTM and determines what information is forgotten and what information is added. Compared with LSTM, GRUs have fewer tensor operations and fewer parameters, and in some cases, they are more precise than LSTM. Attention mechanisms were first used in machine translation tasks to achieve memory for long sequences of sentence input. The attention mechanism considers the context vector and all the sequence input information to construct the connection. The weights of the connections for each output element are learned automatically. The process of calculating attention can be viewed as querying in key and value pairs. The first step is to calculate the similarity of query Q and key K , which can be solved by finding the vector dot product of the two, by finding the vector cosine similarity of the two, or introducing additional neural networks to find the value, as (19): The second step is to normalize the weights to obtain directly usable weights using (20), and then, in the third step, the weights and values are weighted and summed to obtain the attention value using (21).

2) ATTENTION-BASED TCN CONNECTED WITH AN RNN MODEL
In this study, temporal convolutional networks are connected to a recurrent neural network for a landslide risk prediction model.
As shown in Figure 7, both the TCN-Attn-RNN and the RNN-Attn-RNN models proposed in this paper use the encoder-decoder architecture. The input of the TCN-Attn-RNN model is a tensor composed of landslide time series data that is passed through the TCN through the residual block framewor. Then, a weight vector is generated through the attention mechanism and multiplied by the output of the TCN, followed by the decoder operation, where the decoder uses LSTM or a GRU of the RNN architecture.
The RNN-Attn-TCN model is similar. The difference is that the input time-series data tensors after the input attention mechanism are decoded by the RNN. Here, the RNN generalizes the RNN architecture. It can generally use LSTM or GRU. To more fully use the sequence data, the BiLSTM or BiGRU algorithm can be used. The decoder uses the TCN model. The specific steps for RNN-Attn-TCN are the same as those for TCN-Attn-RNN, so they will not be repeated.

IV. EXPERIMENT AND RESULTS
In Figure 8, the training process of the landslide risk prediction model uses the sliding window, and the sliding window is the landslide sensor data input to TCN-Attn-RNN or RNN-Attn-TCN. The inputs are rainfall, surface displacement, shallow moisture content and deep moisture content, and each moment links to a landslide instability margin. The purpose of the landslide risk prediction is to predict the landslide instability margin at a future moment. Therefore, the output of the prediction model is the landslide instability margin at the next time.
The collection process of the landslide dataset can be realized through the landslide simulation platform introduced in section II, and the ratio of the training dataset to the test set dataset is 16:3. The training process uses a supervised learning approach to train the model parameters of the TCN, attention mechanism and RNN with 200 iterations. Figure 9 shows the four measured data points of the landslide sensor and landslide instability margin. In the process of increasing rainfall, the shallow moisture content and deep moisture content also continue to increase, but the surface displacement does not change, at this time, the landslide instability margin begins to increase. When the surface displacement moves, the landslide instability margin changes greatly. After that, the soil moisture content changes very little, while the ground displacement changes drastically, and the landslide instability margin changes accordingly.
For the above data, the input dimension at each moment is 4, and the input and output time lengths are set to 100-10, 100-50, and 1000-100. The codes run on a i5-10500F CPU with a Windows 10 operating system, an NVIDIA GeForce GTX 1650 GPU, and 16 GB of memory, and the deep learning (1) Obtain the landslide source data, and then normalize the data.
(2) Obtain the LIMs using the TOPSIS-Entropy method, shown in Figure 4.
(3) Build the TCN and RNN, and then use the landslide training datasets to train the nets. Note that the input vectors are four impact factors: rainfall, surface displacement, shallow moisture content, and deep moisture content, and the output is the LIM. Input: Step 1. Input attention mechanism: In the softattention mechanism, query, keys, and values are equal to the input tensor X t . The query Q, keys K, and values V can be obtained by (22)- (24), and the attention coefficient can be obtained by (25). Finally, outputX t using (26).
Step 2. Encoder processing: For the input attention outputX t and the hidden state of the TCN model at the t − 1 moment H t−1 to be encoded, the encoding function adopts the TCN model, and the generated hidden state is H t = {h 1 t , h 2 t , . . . , h n t }, which can be expressed as shown in (27).
Step 3. Temporal attention mechanism: Our model uses a soft attention mechanism to generate weights A t = {a 1 t , a 2 t , . . . .a n t }, which are calculated as shown in (28)- (31).
where A t is a weight vector, and the function f can take different functional forms, the details of which are available in (19). Here, the function f uses the dot product. Finally, the weighted intermediate vector C is generated by weighting and summing the V temporal using (32).
Step 4. Decoder processing: For the intermediate vector C, the decoding function adopts the LSTM or GRU model, and the output is Y t , which is the predicted sequence information.
Output: Y t ; H t . We verify the performance of the model by comparing the average absolute error (MAE), the root mean square error (RMSE) and the mean absolute percentage error (MAPE) of the model output [35]. The three metrics are calculated as follows (34)-(36), where y i represents the ith ground truth data and y i represents the ith model-predicted value data.
To quantitatively show the performance improvement of our model, we use the metrics of the LSTM model as the benchmark to calculate the metric reduction percentage (MRP) of other deep learning time series prediction models. The calculation formula is shown in (37).
Here, M L signifies the LSTM function's metrics, and M D denotes the metrics of other deep learning models in (37). The metric decrease percentage (MRP) is obtained using (37).
The prediction model performs better when the metric is smaller. As a result, the greater the MRP with LSTM, as the benchmark model, is, the better the model's performance, indicating that the performance of the model is improved more than that of its counterparts. A negative MRP indicates a decline in performance. The metric data in Table 1 are obtained by testing various deep learning time-series prediction models with varying inputs and prediction lengths, i.e., the statistics of RMSE, MAE, and MAPE. Table 2 is obtained after the MRP results are counted, and the different input and prediction lengths of several deep learning models are tested.  We selected typical deep learning models for landslide risk prediction. Because of their outstanding and typical prediction outcomes in deep learning time-series prediction, the LSTM, GRU, TCN, and convLSTM models are selected for comparison. In Table 1 and Table 2, when the ''input length-prediction length'' is set to 100-10, the RMSE and MAPE metrics of TCN-Attn-RNN are lower, decreasing by 76.31% and 30.49%, respectively, compared with those of LSTM. At this time, the MAE metric of RNN-Attn-TCN is reduced by 87.76%. The RMSE and MAE metrics of TCN-Attn-RNN are 25.00% and 28.04% lower, respectively, than those of the LSTM model when the ''input lengthprediction length'' is set to 100-50. The MAPE of RNN-Attn-TCN, on the other hand, is 1.8016, which is 38.52% lower than the benchmark. The MAE and MAPE are lower for TCN-Attn-RNN with MRP values of 38.46% and 5.12 %, respectively, for a sliding window of 1000-100 for longterm prediction. For RNN-Attn-TCN, its RMSE metric is lowered by 68.89%. When comparing models with the same sliding window length, our model outperforms the others for VOLUME 10, 2022 three metrics. While comparing different input and output lengths, the longer the prediction is, the worse the prediction. In addition, compared to that of other models, the performance improvement is not as excellent as the short timeseries prediction. Figure 10 depicts a comparison of the prediction errors of several models for three different sliding window length instances. Our proposed models have fewer errors and are better suited for landslide time-series prediction, as evidenced by a comparison of the errors in the three sliding window situations. Meanwhile, the error volatility is greater for 100-50 because the predicted length to input length ratio is higher, which limits the model's prediction effect to some extent. For 100-10 and 1000-100, the error volatility is substantially lower. Notably, RNN-Attn-TCN will have a marginally worse prediction effect than TCN-Attn-RNN, as seen in Table 2 and Figure 10.

V. CONCLUSION AND DISCUSSION
In this work, we concentrate on the landslide risk prediction issue, namely landslide risk assessment and modeling. First, the LIMs are calculated by the TOPSIS-Entropy method, which inputs four landslide impact factors and outputs the landslide instability margin. Second, through TCN-Attn-RNN and RNN-Attn-TCN for landslide risk, the output is the future time series for the LIM. To assess the models' efficacy, a comparative analysis of a vanilla LSTM, a GRU, a TCN, ConvLSTM, and our models is conducted. Based on the evaluation of the MAE, RMSE, and MAPE, our models outperform their counterparts in terms of the metric reduction percentage (MRP). Through the high accuracy of the models, the landslide risk can be predicted by deep learning methods based on large-scale landslide data.
For the landslide risk assessment issue, we employ the TOPSIS-Entropy approach for comprehensive assessment of four landslide impact factors for landslide risk assessment. This method can immediately acquire the final LIM used for assessment from sensor data, making it very intuitive and convenient. This multisensor integrated assessment of landslides is more extensive than the single-factor assessment technique. To tackle the landslide risk prediction problem, deep learning for temporal prediction is introduced. The architecture composed of an attention-based TCN paired with an RNN optimizes the model structure of deep learning and has a better ability to handle challenging circumstances such as landslide data modeling.
Note that our models cannot operate with small amounts of data because they require a substantial amount of data for training. There are also issues with integrating the model into an actual landslide prediction site, which is something we should explore in the future. We plan to apply our methods to real-world scenarios and develop a software system to efficiently run our model as required to meet online prediction requirements. In addition, we will expand our studies to include new types of landslide sensor data, such as soil pressure and underground displacement, as more landslide impact factors equal more comprehensive landslide hazard information.