Reg-SA–UNet++: A Lightweight Landslide Detection Network Based on Single-Temporal Images Captured Postlandslide

Landslide detection based on remote sensing images is an effective method for rapidly and accurately detecting landslide regions, which can aid in disaster prevention and mitigation. Landslide detection methods based on semantic segmentation can be used to delineate the scope of landslides while detecting their location. Most existing models use multitemporal or geological data to improve accuracy. However, the large amount of data introduces additional parameters, consuming significant computing resources. Therefore, this article proposes Reg-SA–UNet++, a model for landslide detection, which uses a single-temporal image captured postlandslide. Reg-SA–UNet++ is based on UNet++ with the following modifications: deep supervised pruning is removed for fewer model parameters and increased detection accuracy; RegNet is employed to replace the convolutional blocks in the encoding process to reduce the number of parameters and improve feature acquisition and attention modules are added at the connection of the convolutional blocks of each layer to strengthen the model's attention to landslide features. The overall accuracy and F1 score of the Reg-SA–UNet++ model for the constructed landslide dataset (93.37% and 92.41%, respectively) and landslide mapping (97.09% and 96.10%, respectively) verify the effectiveness of the proposed model in detecting landslides from remote sensing images.


I. INTRODUCTION
L ANDSLIDE is a natural phenomenon caused by various factors, including river erosion, groundwater activity, rain inundation, earthquakes, and artificial slope cutting [1]. Currently, influenced by global climate change, seismic activity, and accelerated urbanization, the number of landslides has gradually increased, and the actual hazards and potential risks have become more prominent. Therefore, timely detection of landslides and Manuscript received 30  determination of their area, scale, and distribution are crucial for disaster management and prevention of secondary disasters [2]. Traditional landslide detection methods are based on field investigation; this process requires significant effort and time. It also poses unanticipated risks to researchers [3]. Remote sensing has several advantages, including fast imaging speed, wide coverage, low cost, and low risk [4]. With the increasing number of remote sensing satellites, landslide detection using remote sensing images became predominant in disaster prevention and reduction. Computer-based methods have significant advantages in remote sensing image processing. These methods systematically and quantitatively simulate researchers' visual interpretation process in landslide detection. Early methods are based on statistics or shallow machine learning techniques. However, these statistical methods generally have complex classification rules that require extensive research experience [5], [6]. These methods can achieve acceptable landslide detection accuracy; however, the classification rules need to be adjusted manually for different areas. Therefore, this approach is not universal and has a low degree of automation. Landslide detection methods based on shallow machine learning techniques [7], [8] first compute data features through function mapping or artificial feature design, then use discrimination criteria to determine the landslide category. When landslides occur in simple backgrounds, the classifiers can complete the task efficiently and accurately. However, the results are easily affected by landforms with similar features to landslides because the shallow features cannot effectively describe original data. In 2006, with the advancement of artificial intelligence, the emergence of the deep belief network [9] initiated the research wave of deep learning. Compared with shallow machine learning, the neural networks used in deep learning have deeper structures and more layers of hidden nodes; therefore, they can better utilize a wide range of sample data for training and better exploit the inherent information of the data. To improve image processing performance, neural network models were gradually diversified; deeper and more complex structures were created and lightweight, high-performance models that can be easily deployed were introduced. The landslide detection methods based on deep learning include two main aspects as follows: 1) Methods based on target detection, such as the R-CNN, SSD, and YOLO series, can determine whether there is a landslide in the image. If there is a landslide, the method further determines its location. Ju et al. [10] used Mask R-CNN to detect landslides automatically on the established loess landslide database and achieved a precision of 0.56, a recall of 0.72, and an F1 score of 0.63. This indicates that Mask R-CNN is robust for the automatic detection of loess landslides when using Google Earth imagery as the dataset. It also shows that the method has the potential to provide fast and accurate regional landslide hazard investigation. To improve the accuracy and inference speed, Cheng et al. [11] constructed a model, YOLO-SA, for landslide detection in remote sensing images. The model replaces the corresponding module in YOLOv4 with the group convolution (Gconv) and ghost bottleneck (G-beck) residual modules and adds an attention module. This method achieves an inference speed of 42 f/s while increasing the accuracy rate to 94.08%. Wang et al. [12] improved the feature stitching method of YOLOv5 for landslides of different shapes. They added adaptive spatial feature fusion to combine feature information of different scales and further used the convolutional block attention module (CBAM) to mine shallow feature information. The model had a detection rate of 74.01%. 2) Methods based on semantic segmentation, such as the UNet series, DeepLab series, and PSPNet series, separate landslide objects by pixels according to the semantics of the image. As a classic semantic segmentation model, UNet has been adapted by scholars to make it suitable for landslide detection. Liu et al. [13] proposed an automatic model to improve the missing and misplaced landslide detection results of UNet. The proposed model increases the input channel count of UNet from three to six to facilitate the input of geoscience features. A residual learning unit was added to the model to fully extract the sample features in order to deepen the network layer. This model outperforms UNet in terms of feature extraction and classification capabilities, and it solves the problem of gradient disappearance or explosion when the network is deepened; its landslide detection accuracy reaches 91.3%. Using ResNet34 for feature extraction, Wang et al. [14] proposed a landslide detection model based on change detection to avoid confusing landslides and bare soil; it uses pre and postdisaster remote sensing images for landslide detection and can achieve an accuracy of 97.69%. Training degradation may occur when the depth of the network layer increases. To address this issue, Qi et al. [15] proposed ResU-Net, a model for automatic landslide mapping. This model uses ResNet to replace each convolutional module in the UNet encoding process and achieves a landslide detection accuracy of 96%. However, it performs poorly in detecting small landslides and is susceptible to interference from landforms that resemble the spectral signature of landslides. The authors of [13], [14], and [15], the abovementioned scholars, enhanced UNet to meet the needs of landslide detection in remote sensing images. UNet has the advantages of a simple structure, few parameters, high segmentation accuracy, and strong variability, making it one of the most commonly used model frameworks in landslide detection.
Bui et al. [16] also proposed an H-BEMD method for detecting landslide regions. To avoid the influence of lighting conditions, this method detects landslides based on the tonal features of images but cannot avoid interferences, such as landform hue features, because it only uses the hue component to detect landslides. Zhang et al. [17] proposed a deep learning model, SMDRF-Net, based on a semi-supervised multitemporal deep representation fusion network. Deep learning can transform original images into abstract high-level representations. The exact outline of the target becomes more difficult to preserve as the network becomes deeper. To solve this problem, SMDRF-Net incorporates object-level spectral features to exploit deep representations of objects. As a result, it can retain accurate boundaries of landslide objects while reducing noise. The landslide detection method based on target detection can quickly detect the location of a landslide; however, it does not characterize its shape or coverage. Alternatively, the semantic-segmentation-based method can mark the location of the landslide and accurately characterize its contour and coverage. Current article focuses primarily on three aspects of detecting landslides based on semantic segmentation. The first one employs remote sensing images of pre and postlandslides to detect it based on the change detection principle because landslides produce more noticeable changes than surrounding landforms. The second aspect entails the use of geoscience data, such as the digital elevation model, digital surface model, and digital terrain model, as an aid to enrich landslide features. The third aspect entails the use of only postlandslide remote sensing images. Compared with this method, the data used in the first two methods contain more features. However, some data are challenging to obtain in practical applications, making relevant research difficult. Concurrently, the detection results can easily fall short of expectations if the various data have a weak time correlation.
Moreover, the consumption of computing resources increases with more data input, making the model more demanding on the hardware. Landslide detection requires only a small amount of data through single-temporal images captured postlandslide, which reduce data and hardware resource requirements. However, this method has few available features and high background noise because it uses a single data type.
Given the aforementioned problems, this article uses UNet++ as the basic framework in the proposed model. Nevertheless, the model performs landslide detection based on single-temporal images captured postlandslide, and it is highly practical and shows good accuracy. Compared with UNet++, the enhancements introduced in this article are as follows: 1) The lightweight network, RegNet [18], is used to replace the convolutional blocks in the UNet++ encoding process, which can effectively reduce the number of parameters and improve the feature capture ability of the model. 2) Shuffle attention [19] is introduced between the encoding and decoding structures of each layer to strengthen the model's understanding of semantic information and make the model prioritize the landslide ontology, enabling it to effectively reduce the interference caused by the background.
3) The deep supervision-pruning operation of UNet++ is removed to further improve the accuracy and reduce the number of parameters. Compared with existing relevant articles, the expected benefits of the current article are as follows: 1) The landslide detection accuracy and generalization ability of the model are effectively improved. 2) The model only uses single-temporal images captured postlandslide, reducing model dependence on original data, complex data preprocessing steps, and a large amount of computing resources. 3) The model uses fewer parameters; therefore, the hardware requirements of the model are also few. The rest of this article is organized as follows: The "Dataset" section describes the landslide dataset required to train the model. The "Methodology" section introduces the basic framework of the model, the reconstruction of the model to improve its accuracy, and the loss function used for training. The "Experimental Setup" section describes the experimental parameter settings. The "Evaluation Criteria" section describes the evaluation criteria for the experimental results. The "Results and Discussion" section presents the experimental results and discusses our model. Finally, the "Conclusion" section summarizes the full text.

A. Construction of Landslide Remote Sensing Image Dataset
Based on the high-precision remote sensing images and interpretation datasets developed based on the landslide disasters that occurred in Sichuan and its surrounding areas published by Zeng et al. [20], we identified several areas where landslides frequently occur, and 1215 remote sensing images of landslides were collected to construct the dataset. The areas mainly include Qiaojia County and Ludian County in North-eastern Yunnan Province, China [11]; Lanzhou and LinxiaHui Autonomous Prefectures in Gansu Province [10]; Wuzhou City in Guangxi Zhuang Autonomous Region [21]; Jiuzhaigou County of Aba Tibetan; and Qiang Autonomous Prefecture in Sichuan Province [22]. All of these areas have experienced different scale landslides at different times. Therefore, we acquired landslide images in different backgrounds to make the dataset coverage more compatible with some representative remote sensing images of landslides, as shown in Fig. 1. The open-source image labeling tool, LabelMe, was used to label the graphic objects of the dataset to create semantic segmentation labels. As a result, the landslide bodies appear white according to landslide detection characteristics and task requirements.

B. Features of Landslide Remote Sensing Images
After a landslide, the surface is commonly covered with bare soil and vegetation. Therefore, these regions have a low vegetation index, high spectral reflectance, and prominent brightness. They are predominantly khaki and grey, which differ from the background color of the landslide. The landslide movement exhibits an obvious directional texture on the remote sensing images, which significantly differs from the texture of the surrounding landforms. Remote sensing images show that the landslide has a rough, corrugated texture perpendicular to the sliding direction; dappled shadows may appear on the landslide images due to the uneven surface. The deep learning model can complete the landslide detection task by learning the above features.
The landslides presented by remote sensing images lack specific shapes and smooth boundaries because of the characteristics of satellite overhead imaging. Therefore, landslide detection is more difficult than regular-shaped target detection. The different locations of landslides result in different background colors, which mainly include green in the background of the forest [ Fig Fig. 1(g)], resulting in varying degrees of shadows. The above features reflect the complexity and uncertainty of landslide remote sensing images. The dataset created in this article contains different landslide scenes, which increases the difficulty of landslide detection, enabling the development of a detection method with stronger generalization performance.

III. METHODOLOGY
UNet++ is a U-shaped symmetrical mesh structure, as shown in Fig. 2, with an encoder on the left and a decoder on the right. The downward blue solid arrows shown in Fig. 2 indicate downsampling during encoding, which includes max pooling and average pooling. The upward green solid arrow indicates the upsampling of the decoding process, which includes transposed convolution and unpooling. The dotted arrow between the encoder and the decoder is a skip connection, which fuses multiple feature maps through a series of nested convolution blocks. Furthermore, the feature map extracted by the last convolution block is analyzed to obtain the segmentation result. The tightly connected structure can fuse multilevel features and fully parse contextual semantic information, improving semantic segmentation accuracy. However, this structure leads to a dramatic increase in the number of parameters and required computational resources. Zhou et al. [23] used the deep supervision-pruning method to solve this problem; that is, when comprehensive indicators, such as segmentation accuracy and speed, meet the task requirements, part of the network structure is pruned. Although this operation can reduce the number of parameters, it also affects accuracy. The multiscale feature maps generated by different convolution blocks in the dense structure have equal weights; however, the semantic information contained in different scale feature maps differs. Therefore, after further downsampling of remote sensing images, small regions and landslide details may disappear, especially in regions where the features of the landslide and the background are similar. Moreover, fusing feature maps with the same weight makes it difficult to distinguish landslide boundaries and small-area landslides. To solve the above problems and make the model more suitable, this article proposes a new model, Reg-SA-UNet++ (RS-UNet). Fig. 3 depicts the model's structure.
As shown in Fig. 3, compared with UNet++, RS-UNet improves the feature extraction module. In the encoding process, RegNet is used as a convolution block, which can reduce model parameters and improve feature extraction capabilities; maximum pooling is used to preserve the texture features of landslides. In the encoding and decoding processes, a shuffle attention module is added to the connection part of each layer to improve the model's semantic parsing ability. To further improve its accuracy and reduce the number of parameters, the deep supervision-pruning operation of UNet++ is removed. RS-UNet inherits the fully connected structure of UNet++ to fuse multiple feature maps, which can enrich feature information. The final result is obtained through softmax classification after the feature map passes through the last convolutional block of the decoding process.

A. RegNet
As a binary classification problem, the single-temporal remote sensing images captured postlandslide contain fewer features. Moreover, remote sensing images often suffer from data imbalance, meaning that some classes in the image occupy larger regions than others. It is frequently observed in the landslide remote sensing images that the landslide is just a small fraction of the image, and the background takes up most of it, making it challenging to extract the landslide features. In the dataset constructed in this article, there are shadows and additional landforms, such as inhabited areas, water systems, and clouds around the landslide, which further interfere with the feature extraction. In the case of limited training samples, complex model structure, and several parameters, it is easy to generate excessive reliance on limited data samples even if a classification model is obtained. As a result, when the model is applied to a new dataset, its detection performance is hardly as good as expected. Moreover, complex models have complex intermediate computations, which consume considerable frame buffers. Therefore, the parameters need to be tuned for optimal performance of the model before training. However, it is challenging to make all parameters reach the optimal value when there are many of them, which may compromise the model's generalization ability. Therefore, to reduce the number of parameters and improve the feature extraction ability of the model, we replace the convolution block in the UNet++ encoding process with a lightweight network, RegNet.
As shown in Fig. 4, the structure of RegNet comprises three main parts: stem, body, and head [ Fig. 4(a)]. The stem is a convolutional layer whose convolutional kernel size is 3 × 3; the step distance is 2, with 32 convolutional kernels. The head is a classifier for outputting different classes. As shown in Fig. 4(b), the structure of the body consists of four stages; from stage 1 to stage 4, the resolution of feature maps is halved in turn; the number (i.e., w1, w2, w3, and w4) of feature maps output at each stage is a hyperparameter determined by searching; each stage consists of several blocks (Fig. 5), and the hyperparameters and need to be determined by searching.

B. Shuffle Attention
In this article, shuffle attention modules (SANet) (Fig. 6) are added at the junction of each level of encoding and decoding structures; this improves the model's ability to understand the semantic information of landslide remote sensing images, enhances the model's attention to landslides, reduces the interference of other landforms, and guides the model to learn the    features of landslides better. Furthermore, by weighting the feature map, this module suppresses the information that negatively impacts landslide detection and retains the information that helps analyze feature information. As mixed-domain attention, in the spatial domain, SANet transforms the spatial information in the original picture into another space and preserves the key information; in the channel domain, it algorithmically compares channels and allocates resources to channels that require more attention.
SANet is a lightweight mixed-domain attention network. It splits the input data X into G groups along the channel dimension and processes each group of subfeatures in parallel using channel splitting. During the parallel processing, the channel domain branch uses GAP to generate channel statistics and then uses a pair of parameters to scale and shift the channel vector; the spatial domain branch uses the group norm to generate spatial statistics and then creates compact features similar to the channel branch. After the parallel processing is completed, concat fusion and intergroup communication are performed so that the output of SANet is the same size as the input, making SANet perfectly embedded in the model.

C. Matthews Correlation Coefficient Loss Function
Landslide remote sensing images often suffer from data imbalance. Unless accounted for when training a deep-learningbased segmentation model, such an imbalance can lead to the model converging toward a local minimum of the loss function, yielding suboptimal segmentation results biased toward the background. Loss functions based on overlapping metrics are often the first choice for such problems. Currently, Dice loss (DIC) and IOU loss are frequently used in overlapping metric-based loss functions; neither penalizes misclassifications of true negative pixels, making it challenging to optimize for accurate background prediction. Therefore, a loss function based on the Matthews correlation coefficient (MCC) is proposed [24]. The MCC for a pair of binary classification predictions is defined as Matthews correlation coefficient loss (MCL) can be defined as follows: In a previous article [24], MCL was proved to achieve better results than the above two loss functions, so this article introduces MCL to improve remote sensing image landslide detection results.

IV. EXPERIMENT
In the experimental part of this article, the landslide dataset mentioned in the "Dataset" chapter is divided into training, test, and validation sets in a ratio of 7:2:1; the experimental environment is PyTorch 1.7 framework, the operating system is Windows 10, the CPU is a 12-core Intel Xeon Gold 6226, the primary frequency is 2.70 GHz, the memory is 64 GB DDR4, the GPU is NVIDIA's Quadro RTX 5000, the frame buffer is 2 × 16 GB, the CUDA version is 11.0, and the cuDNN version is 7.6.5.
During the training process, the input image size was set to 512 × 512; the optimizer was Adam, the initial learning rate was 0.001, the weight decay was 0.0005, the loss function was MCL, and the learning rate decay strategy was cosine annealing, which is defined as follows: where i represents the first few runs of the program; η i max represents the maximum value in the learning rate; η i min represents the minimum value in the learning rate; T cur represents the number of cycles executed; and T i represents the total number of cycles in the ith run.
In this experiment, max _epoch was set to 200, and the validation set was used for evaluation after each epoch. The training was terminated if the evaluation index of 10 consecutive epochs did not improve. Notably, in addition to the targeted improvements adopted by some of the mentioned methods, these experimental settings have also been utilized.  To quantitatively illustrate the superior performance of our model, we evaluated the model by the overall accuracy (OA), mean intersection over union (MIoU), balanced score (F1score), number of parameters (Params), floating-point operations (FLOPs), and frames per second (FPS). OA represents the ratio of the number of pixels with the correct predicted category to the total number of pixels. MIoU is the ratio of the intersection and union of the true value set and the predicted value set. F1 score is the harmonic mean of precision and recall, and it is frequently used as an evaluation metric for imbalanced sample classification. The value range of the above three indicators is between 0 and 1, and the larger the value, the better the result. Params refer to the sum of weights, bias terms, and other parameters processed during model training, and their unit is millions (M); the fewer the parameters, the lower the computational cost. The number of FLOPs is the number of single operations required to run the model, and the unit used here is giga multiply-accumulate operations per second (GMACS); the larger its value, the greater the required amount of computation. FPS represents the number of frames transmitted per second, which reflects the real-time nature of the model. If p ij represents the number of pixels whose true value is class iand predicted value is class, j , N is the total number of classes, k, and kth is the class label. The calculation method of each evaluation standard can be described as follows:

A. Ablation Experiments
This section presents a demonstration of the effectiveness of the improvements through ablation experiments. In deep  Table I lists the ablation  experiment results. To compare the models, we performed four groups of experiments as follows: 1) Group A: Using the UNet++ model. 2) Group B: Using a model that only changes convolutional blocks without increasing the attention module, referred to as Reg-UNet in Table I. 3) Group C: Using a model that only increases the attention module without changing the convolutional blocks, referred to as SA-UNet in Table I. 4) Group D: Using the RS-UNet model.   Table I shows that adding the attention module improves the evaluation index of l more than replacing the convolution block, indicating that the attention module plays a significant role in landslide detection.
To observe the effect of the attention module, we used the Grad-CAM [25] visualization method to conduct a qualitative analysis of the three models (UNet++, Reg-UNet, and RS-UNet). Grad-CAM traces the reverse gradient flow to discover which locations of the feature map have a more significant impact on the final output, as shown in Fig. 8. Although Reg-UNet improves the ability to guide feature learning compared with UNet++, its feature guiding force for small and complex boundary landslides remains weak; RS-UNet can learn the features of landslides more accurately. As shown in Fig. 8, the heat distribution of UNet++ is uneven, and other landforms, such as bare soil and sparse vegetation [ Fig. 8(a)-(c)], were also highlighted. As the shape of the landslide becomes more complex, UNet++ pays less attention to it, as shown in Fig. 8. The heatmap of RS-UNet has a clear boundary between the landslide body and the background, as shown in Fig. 8(a)-(d).
For the very complex-shaped landslide in Fig. 8(e), RS-UNet can focus most of the attention on the landslide body. Therefore, RS-UNet can better guide the model for feature learning.

B. Effect of Loss Function on the Result
To illustrate the superiority of the loss function chosen in this article, DIC [26], IOU loss (IOU) [27], weighted crossentropy loss (WCE) [28], focal loss (FOL) [29], asymmetric loss (ASY) [30], and focal Tversky loss (FTL) [31] were selected for comparison experiments. DIC, IOU, and MCL belong to the loss functions based on similarity measures; ASY, FTL, WCE,  and FOL are frequently used for sample imbalanced semantic segmentation. One of the detection results in the test set is selected as an example shown in Fig. 9, indicating that the best visual results can be achieved when MCC is used in training. The result data of the whole test set is shown in Table II. Although the loss function has a minor impact on the model results, as the loss function is MCL, both OA and MIoU achieved the best values, which are 0.48% and 2.54% higher than the suboptimal value, respectively; the F1 score was the suboptimal value, which is 0.72% lower than the best value.

C. Contrast of Different Attention Mechanisms
As detailed in this section, four attention modules (SCSE [32], SE [33], CBAM [34], and SKA [35], [36]) were used for comparative experiments. One of the detection results in the test set is selected as an example shown in Fig. 10, indicating that when the attention module is SA, the result closest to the label image can be obtained, and when the attention module is SKA, the obtained result is considered invalid. The result data of the whole test set is shown in Table III. To distinguish models using different attention modules, they are called Reg-UNet-XXX, where XXX is an abbreviation for different attention modules. SCSE improves upon SE; however, the model using SE achieved better results, suggesting that the improvement of the attention module may not be suitable for landslide detection. Although the model using SKA had the most parameters and FLOPs, it produced the worst results; therefore, the quality of the model does not depend on the number of parameters and computation. Compared with the results of Reg-UNet in Fig. 8, the detection accuracy of the model changes to varying degrees after adding different attention modules, implying that the use of attention modules may not have a positive impact on landslide detection. The parameters and FLOPs of the model using SA did not change significantly; however, both OA and MIoU reached the optimal value, which was 0.93% and 0.81% higher than the suboptimal value, respectively; the F1 score reached the suboptimal value, which was 0.94% lower than the optimal value. In conclusion, the model with SA had the best performance.

D. Comparison With Other Methods
A landslide detection model based of remote sensing image, ResU-Net [15] and six classical semantic segmentation models (FPN [34], PSPNet [37], DeepLabv3+ [38], UNet++ [23], Res2-UNet [39], and SegFormer [40]) were selected for comparative experiments. Some landslide detection results were selected and shown in Fig. 11. Fig. 11(a) and (b) is, respectively, landslides and their tag images in different scenes. As shown in Line I of Fig. 11, all methods can obtain accurate results when the landslide area is large and the background is forest. With the presence of water bodies and roads in the background (as shown in Line II of Fig. 11  of the landslide with a quite small mistake. When there are a large amount of trees in the landslide area (as shown in Line IV of Fig,  11), DeepLabv3+ and SegFormer can depicted more accurate landslide areas [ Fig. 11(e) and (g)] than other methods. It can be inferred from the above analysis that the detection results of RS-UNet were better than those of other models.
The results of evaluation metrics for the test set are listed in Table IV. RS-UNet showed enhancement for all metrics, namely, 20.42-M fewer parameters, 51.56 fewer GMACs, 3.94% more OA, 10.05% more MIoU, and 10.07% higher F1 score than those of UNet++. The OA, MioU, and F1 scores of RS-UNet were 2.3%, 2.38%, and 2.59% higher than those of ResU-Net, respectively; moreover, parameters and FLOPs dropped by 18.77 M and 9.24 GMACS, respectively. Compared with the metrics of other models in Table IV, the number of parameters of RS-UNet decreased considerably. Compared with the suboptimal values of various indicators, the OA, MIoU, and F1 scores of RS-UNet increased by 1.03%, 1.60%, and 1.45%, respectively; additionally, its parameters reduced by 10.75 M, and FLOPs reached an acceptably low level. Although Res2-UNet achieved landslide detection results close to RS-UNet, the Params and FLOPs of Res2-UNet greatly exceeded those of RS-UNet with 27.01 M and 21.43 GMACS, respectively. Among all models, the FLOPs of PSPNet are only 9.59 GMACS. It is speculated that models with fewer FLOPs cannot meet the computational requirements of feature maps. In contrast, UNet++ had the highest number of parameters and substantial computation; however, its indicators could not reach suboptimal values, demonstrating that more parameters and calculations do not determine the efficiency of the model. In all the aforementioned methods, the FPS value of PSPNet reached the highest (158.38), but it has quite low landslide detection accuracy. Res2-UNet has the smallest FPS value of 21.19, while RS-UNet has a higher FPS value of 2.43, which makes the whole reasoning process short. Since there is no hard requirement for real-time landslide detection after disaster, the reasoning speed of RS-UNet is acceptable under the condition of high landslide detection accuracy.
To analyze the impact of image size on landslide detection results, we changed all the images of the test dataset to 0.1× (represent 10% of the original image scale), 0.2×, 0.3×, 0.4×, 0.5×, 0.6×, 0.7×, 0.8×, and 0.9× of its original size, and tested the performance of the landslide detection model at these nine scales. We selected Res2-UNet, SegFormer, and ResU-Net with better performance in Table IV for comparative experiments. Their quantitative indicators are calculated and plotted as a broken line chart (Fig. 12).
As shown in Fig. 12, almost all models maintain good performances when the image scale was changed from 1.0× to 0.5×, except that the F1-score and MIoU of the SegFormer decrease as the image scale decreases. Furthermore, the OA, MIoU, and F1-score values of RS-UNet were the highest within a certain range of image scale. With the development of remote sensing technology, the resolution of remote sensing images is improving. Under the condition of higher resolution, the image scale will not excessively interfere with the performance of models.

E. Application in Landslide Mapping
To verify the practicability of RS-UNet in landslide mapping, the landslides caused by earthquakes in Hokkaido, Japan, were selected [41] for landslide mapping tests. The basic situation of the landslide is shown in Fig. 13; landslides occurred mainly in vegetation-covered mountains; in addition, farmland and residential areas were present in the background. The image of the landslide region was obtained by Bigemap software. However, the image size is large, making it difficult to directly input it into the network model. Therefore, it was first cropped into multiple subimages with a size of 512 × 512 pixels and then inputted into the network model for training and testing; subsequently, the test results from the subsections were stitched together to obtain the detection results for the entire area. Furthermore, 20% of the cropped images were randomly selected as the training set, and the mapping test was performed on the whole image.
As shown in Fig. 14(a) and (b), both the methods completed large-scale landslide mapping with high accuracy, but different degrees of misjudgments were noted in the results; these errors mainly occurred in two types of areas: 1) the boundary of the landslide; and 2) bare soil, fields, and low vegetation in the background. The incorrect judgment of the boundary may be because of the error between the manual annotation and the actual landslide area; the same phenomenon can also cause some landslides to be unsuccessfully detected; these errors are minor and have little effect on the detection results. Misjudgments in the background are caused by landforms resembling landslide features (fields and bare soil resembling a landslide mass; low vegetation resembling a landslide-destroying vegetated area). The missing areas in the background appear mostly in the vegetation areas damaged by landslides; it is difficult for the model to determine whether these areas belong to the landslides because of the different degrees of damage. As a result, fewer misjudged and missed areas can be observed in the RS-UNet results, as shown in Fig. 14(c) and (d); therefore, RS-UNet is more effective than ResU-Net in landslide mapping.
The OA, MIoU, and F1 scores of the two methods are shown in Table V. Both methods achieved high detection rates (up to 95% or more) in the landslide mapping with a relatively simple background. Compared with ResU-Net, the OA, MIoU, and F1 scores of RS-UNet 's landslide detection results were improved by 0.33%, 1.25%, and 1.86%, respectively.

VI. CONCLUSION
In this article, we developed a remote sensing image landslide detection model, RS-UNet, based on the attention mechanism and the lightweight network (RegNet). The model was tested on the constructed dataset with different loss functions and attention modules to observe their impact on model performance. The conclusions of this article are outlined as follows: 1) RegNet was used to replace the convolution block in the UNet++ encoding process, and an improved landslide recognition performance was achieved.
2) The addition of the attention module enabled the model to better guide the feature learning process. 3) MCL utilized all the elements of the confusion matrix to achieve the best landslide detection performance. RS-UNet effectively accomplished landslide detection using the remote sensing images and showed superior performance compared with the existing models. However, the small amount of landslide data used in this article and the human errors in regional labeling affected the detection accuracy of the model. In future articles, we aim to establish a larger dataset to further improve the generalization ability and detection accuracy of the model. At the same time, the occlusion of clouds and fog is also a significant factor that interferes with landslide detection. Thus, in future article, one of the problems to be solved is how to accurately detect landslides under the interference of clouds and fog. He is currently an Assistant Engineer of machine vision with the Zhengzhou Coal Machine Hydraulic Electric Control Co., Ltd., Zhengzhou. His research interests include intelligent interpretation of remote sensing images and image processing.
Wanjie Lu received the B.S. degree in photogram-