A Multi-Input Channel U-Net Landslide Detection Method Fusing SAR Multisource Remote Sensing Data

Accurate and efficient landslide identification is an important basis for landslide disaster prevention and control. Due to the diversity of landslide features, vegetation occlusion, and the complexity of the surrounding surface environment in remote sensing images, deep learning models (such as U-Net) for landslide detection based only on optical remote sensing images will lead to false and missed detection. The detection accuracy is not high, and it is difficult to satisfy the demand. Synthetic aperture radar (SAR) has penetrability, and SAR images are highly sensitive to changes in surface morphology and structure. In this study, a multi-input channel U-Net landslide detection method fusing SAR, optical, and topographic multisource remote sensing data is proposed. First, a multi-input channel U-Net model fusing SAR multisource remote sensing data is constructed, then an attention mechanism is introduced into the multi-input channel U-Net to adjust the spatial weights of the feature maps of the multisource data to emphasize the landslide-related features, and finally, the proposed model is applied to the experimental scene for validation. The experimental results demonstrate that the proposed model combined with SAR multisource remote sensing data improves the perception ability of landslide features, focuses on learning landslide-related features, improves the accuracy of landslide detection, and reduces the rate of false detections and missed detections. Compared with the traditional U-Net landslide detection method based on SAR multisource remote sensing data and the traditional U-Net method that disregards SAR multisource remote sensing data, the proposed method has the best quantitative evaluation indicators. Among them, the proposed method obtained the highest F1 value (66.18%), indicating that fused SAR remote sensing data can provide rich and complementary landslide feature information, simultaneously setting up a multichannel U-Net model to input multisource remote sensing data can effectively process landslide feature information. The proposed method can provide theoretical and technical support for landslide disaster prevention and control.

A Multi-Input Channel U-Net Landslide Detection Method Fusing SAR Multisource Remote Sensing Data Hesheng Chen , Yi He , Lifeng Zhang, Wang Yang , Yaoxiang Liu, Binghai Gao , Qing Zhang, and Jiangang Lu Abstract-Accurate and efficient landslide identification is an important basis for landslide disaster prevention and control.Due to the diversity of landslide features, vegetation occlusion, and the complexity of the surrounding surface environment in remote sensing images, deep learning models (such as U-Net) for landslide detection based only on optical remote sensing images will lead to false and missed detection.The detection accuracy is not high, and it is difficult to satisfy the demand.Synthetic aperture radar (SAR) has penetrability, and SAR images are highly sensitive to changes in surface morphology and structure.In this study, a multi-input channel U-Net landslide detection method fusing SAR, optical, and topographic multisource remote sensing data is proposed.First, a multi-input channel U-Net model fusing SAR multisource remote sensing data is constructed, then an attention mechanism is introduced into the multi-input channel U-Net to adjust the spatial weights of the feature maps of the multisource data to emphasize the landslide-related features, and finally, the proposed model is applied to the experimental scene for validation.The experimental results demonstrate that the proposed model combined with SAR multisource remote sensing data improves the perception ability of landslide features, focuses on learning landslide-related features, improves the accuracy of landslide detection, and reduces the rate of false detections and missed detections.Compared with the traditional U-Net landslide detection method based on SAR multisource remote sensing data and the traditional U-Net method that disregards SAR multisource remote sensing data, the proposed method has the best quantitative evaluation indicators.Among them, the proposed method obtained the highest F1 value (66.18%), indicating that fused SAR remote sensing data can provide rich and complementary landslide feature information, simultaneously setting up a multichannel U-Net model to input multisource remote sensing data can effectively process landslide feature information.The proposed method can provide theoretical and technical support for landslide disaster prevention and control.

I. INTRODUCTION
L ANDSLIDES are considered one of the most serious natu- ral disasters in the world, in which soil, rocks, and objects located above or below unstable slopes are affected by various factors such as rainfall, earthquakes, and human activities to move downward and form landslides [1], [2], [3], [4].Landslide characteristics are strong suddenness, high harm, causing many casualties and economic losses, and causing damage to the surface environment [5], [6], [7], [8].After a landslide occurs, accurately obtaining and recording information on the location and size of the landslide is critical for providing disaster emergency response, disaster damage estimation, and postdisaster reconstruction [9], [10].In addition, landslide detection is extremely important for updating landslide inventory datasets with accurate location and extent information and is also necessary for further landslide sensitivity modeling and mapping for disaster warning and risk assessment [11], [12], [13].
With the development of remote sensing technology, visual interpretation, pixel-based and object-based methods based on optical remote sensing images such as Landsat, RapidEye, Sentinel-2A, etc., have been widely used in landslide interpretation [14], [15].The visual interpretation method generally produces more accurate results while avoiding hazardous field investigations, but the method relies on the knowledge of the interpreter [16].Pixel-based and object-based methods are often combined with support vector machine, random forest, and other machine learning methods to achieve landslide detection [17].The pixel-based method mainly utilizes the pixel information of remote sensing images for landslide detection, but this method only considers the characteristics of a single pixel, neglects the correlation between adjacent pixels, and is sensitive to noise [18], [19].The object-based approach can widely utilize the shape, statistics, texture features, and contextual information of landslides for further analysis.However, the method is difficult to determine a reasonable segmentation scale for different regions, which affects the effectiveness of landslide detection [20], [21].In recent years, the development of deep learning has provided new ideas for landslide detection and is gradually being applied to landslide detection with better results than traditional machine learning methods.Among them, the methods based on convolutional neural networks (CNN) and U-Net combined with optical remote sensing images are more frequently used in landslide detection tasks [22], [23], [24], [25], [26], [27].
However, the occurrence of landslides is often accompanied by adverse weather conditions and complex geographical environments, such as heavy rainfall and obstruction caused by cloud or vegetation cover.Therefore, landslide detection based solely on optical remote sensing images combined with deep learning methods may have certain limitations.Synthetic aperture radar (SAR) can image normally under bad weather conditions, and at the same time has a certain penetration ability to penetrate clouds, vegetation, and other information, so it is able to observe the ground all day long and all-weather [28], [29].In addition, the formation of landslides is closely related to topographic features [30], and SAR images are highly sensitive to changes in surface morphology and structure, so SAR images can be a better data source for landslide detection [31].Nava et al. [32] used CNN to conduct landslide detection in Iburi Prefecture, Hokkaido, Japan, using SAR data.Meena et al. [33] conducted a study on automated landslide detection in the Himalayan region using RapidEye optical data and ALOS-PALSAR, derived topographic data (DEM, slope) and analyzed the potential of U-Net and machine learning methods for automated landslide detection in this region.
Although the landslide detection methods based on U-Net and its variant structures and various types of remote sensing data have achieved good results, there is still little attention paid to the deep learning model for landslide detection using SAR data fusion optical and terrain factors and other multisource remote sensing data input.In addition, multisource remote sensing data contains abundant information while also introducing redundant data.When extracting landslides, U-Net directly fuses the shallow and deep feature information extracted through the skip connection structure.The semantic differences between the two are significant, resulting in feature redundancy and a semantic gap that can interfere with the landslide features learned by the model [34].Therefore, how to improve the deep learning model (U-Net), effectively input and process multisource remote sensing data, strengthen the learning of landslide features, effectively distinguish landslides from similar ground objects, and improve the accuracy of landslide detection in complex scenes is a challenging problem faced by using multisource remote sensing data for landslide detection.
This study constructs a multi-input channel U-Net landslide detection method based on multisource remote sensing data such as SAR, optics, and terrain factors.The goal of this article is to 1) construct a multi-input channel U-Net landslide detection framework that integrates SAR multisource remote sensing data; 2) verify the proposed model in Bailong River basin experimental scene; 3) compare and analyze the results and performance of landslide detection with traditional U-Net models and different types of remote sensing data.

A. Data Sources
The data used in this study include landslide vector data, SAR (Sentinel-1A) data, optical remote sensing image (Sentinel-2A) data, and terrain (slope, aspect) data.The landslide vector data are processed as labeled data, which provide a definition of the true class of each pixel in the image for subsequent training and testing of the deep learning model.SAR, optical remote sensing image, and terrain data are used as the image data for landslide detection, which provides the visual information required by the model to understand the semantic content in the image and classify the image at the pixel level.
1) Landslide Vector Data: Based on the historical landslide data and combined with fieldwork verification, landslides in the study area were vectorized and labeled by visual interpretation on Google Earth to obtain landslide boundary information, and a total of 732 landslide vectors were obtained.Then the vectorized and labeled landslide data were imported into ArcMap 10.7, converted into raster data, and exported as binary image, which was used as label data for deep learning model training.The labeled image contained two types of pixels, white for landslide areas and black for nonlandslide areas.
2) SAR Remote Sensing Data: SAR data are adopted from Sentinel-1A data obtained from the Alaska Satellite Facility (ASF, https://asf.alaska.edu/)website.The Sentinel-1A satellite is equipped with SAR sensors, which can observe the Earth's surface in an all-weather and all-day manner.Compared to optical sensors, SAR has the advantage of providing high-quality observation data even under night, cloud cover, and low visibility conditions.When downloading SAR data, the data obtained were in L1 single look complex mode on May 29, 2021, using "Vertical Transmit and Horizontal Receive" (VH) and "Vertical Transmit and Vertical Receive" (VV) polarization methods and interferometric wide acquisition mode.SAR intensity images are heavily affected by noise, so this study conducts "filtering" processing on SAR data to minimize the impact of noise as much as possible.SAR images are affected by geometric distortions, and this study used terrain correction for compression and other distortions, while shadows are difficult to correct.However, in this study, we mainly used multisource data (optical, SAR, and terrain) for landslide identification, with the aim of utilizing the rich and complementary information of these data to comprehensively present diverse landslide features and improve the performance of landslide detection.First, the Sentinel-1A data and digital elevation model (DEM) data were imported into SARscape software, the VH and VV polarization modes were selected, and then the operation of multilooking, filtering, geocoding, and radiometric calibration was carried out sequentially (specific parameters are shown in Tables I-III).Finally, the preprocessed data were imported into ArcMap10.7software, and the results were exported in PNG format after resampling to obtain the Sentinel-1A image data of VH and VV polarization modes in the experimental scene [see Fig. 1 3) Optical Remote Sensing Data: Sentinel-2A data from European Space Agency (ESA, https://scihub.copernicus.eu/)were used as optical remote sensing data.The cloud-free Sentinel-2A data on April 29, 2021 were selected according to the setup.Sentinel-2A is a high-resolution multispectral imaging satellite, which includes visible light band, near-infrared band, and midinfrared band, and provides image data with three resolutions of 10, 20, and 60 m.The Sentinel-2A data obtained from ESA at the L1C level were not atmospherically corrected, and to eliminate atmospheric effects, they were imported into the Sen2Cor plug-in in the SNAP software developed by ESA for atmospheric correction to obtain L2A data.Subsequently, the SNAP software was used to resample the L2A-level data at 10 m resolution to obtain Sentinel-2A data in 12 bands.Finally, the resampled images of each band were imported into ArcMap 10.7 software, and the experimental scene was embedded and cropped to obtain the Sentinel-2A images of each band, which were exported in PNG format, and some of the images of the bands are shown in Fig. 1(c) and (d).
4) Terrain Data: Terrain data include slope and aspect, which provide important terrain auxiliary data for subsequent landslide detection.Slope and aspect were calculated based on DEM data with ALOS 12.5 m DEM Resolution.ALOS 12.5 m DEM data were elevation data produced by using advanced land observing satellite (ALOS) phased array L-band synthetic aperture radar (PALSAR).After generating slope and aspect data, imported them into ArcMap10.7software for resampling, and exported the results in PNG format.Through the above steps, slope and aspect image data in the experimental scene were obtained [see Fig. 1(e) and (f)].

B. Methods
First, the acquired Sentinel-1A, Sentinel-2A, slope, and aspect data were preprocessed and converted into image data suitable for deep learning model training, and the corresponding datasets were created.Then, the multi-input channel U-Net network model was constructed by combining Python language with TensorFlow as the base framework.The experimental scene was selected to train the model using the dataset in the training area and test the model using the testing area.Finally, the landslide detection results of different models combined with different data types were evaluated according to the visual effect and relevant quantitative evaluation indexes, and the model performance was analyzed.The specific flow is shown in Fig. 2, and the following sections provide detailed information about various aspects of the methodology of this study.

1) Dataset Construction:
Because the size of original remote sensing image is too large to be directly input into the model for training, images were cut into several subimages of the same size.At the same time, the size selection of subimages had a certain impact on the performance of the model.Too small size led to failure to cover large-scale landslide areas, while too large size led to GPU memory overflow [35].Through the experimental test, considering the landslide scale and hardware constraints, the subimage size was set to 256×256 pixels in this study.In the process of image cutting, the overlapping cutting strategy with a step size of 128 was adopted.The Sentinel-1A images, Sentinel-2A images, slope images, aspect images, and ground truth images were cut into several subimages with a size of 256×256 according to the same cutting strategy, and each subimage contains at least one landslide area, to ensure the effective learning of landslide characteristics by the model.
To increase the number and diversity of samples, improve the generalization ability and robustness of the model, and make the deep learning model better adapt to different data distributions and real scenes, this study used some data enhancement strategies to expand the dataset [36].Specific data enhancement methods included mirroring and rotating the cut image and ground truth (90°counterclockwise rotation).Through these operations, 3153 samples were finally obtained.Then, the obtained samples were randomly divided according to the ratio of the training set to the verification set of 8:2.After the division, the training set contained 2850 samples, and the verification set contained 573 samples.The training set and the verification set were mutually exclusive (see Fig. 3).The training set is used to train the parameters and weights of the model, and the validation set is used to evaluate the performance of the model and select the optimal model parameters [37].Through the data enhancement 2) U-Net Model: U-Net proposed by Ronneberger et al. [39] in 2015 is a 2-D image semantic segmentation network based on fully convolutional networks [38].Due to its robustness, it makes this network widely used in remote sensing image segmentation as well.U-Net is a classic "U" network structure, which is mainly composed of encoder, decoder, and skip connection.The encoder part includes convolution layer and pooling layer.The convolution layer is used to extract the spatial features of the image, and the pooling layer is used to reduce the dimension of the features and increase the receptive field while reducing the size of the feature map; the decoder maps deep abstract features, including upsampling layer and convolution layer.According to the upsampling, it restores a certain spatial size, and uses convolution layer to gradually restore the spatial details; skip connection is one of the key ideas of U-Net.By copying the feature map of the corresponding depth in the encoder and decoder and concatenating the dimension with the upsampling results in the decoder in the channel dimension, it realizes the fusion of shallow features and deep features, and achieves the purpose of feature fusion in the encoding and decoding process.In this study, U-Net is used as the base network for the landslide detection model, and its structure is shown in Fig. 4.
3) Multi-Input Channel U-Net Model: In this study, fusing SAR multisource remote sensing data for landslide detection, and how to input the data effectively into the model are current problems to be considered.In addition, the nonlandslide background occupies most of the area in the image, and the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.background information other than landslides will also affect the accuracy of landslide detection.Therefore, this study first constructs a multiple input channel to abstract the multisource remote sensing data into multiple input channels so that they can be inputted better.In the field of computer vision, the attention mechanism enables the model to selectively focus on the subject feature information and perform weighted aggregation calculations in the spatial dimension of the feature map, and its basic function is to highlight the subject-related feature part of the feature map, so that the related information will have higher weights in the subsequent calculations [40], [41].Therefore, for the nonlandslide background, during the skip connection between the encoder and the decoder, this study abandons the traditional U-Net direct splicing method and introduces an attention mechanism module to process it in order to adjust the spatial weights of the feature map, focusing on the landslide features [42].It contains the concatenate operation, two 1×1 convolution operations, and the final Hadamard product operation.The structure of the attention mechanism module is shown in Fig. 5 and the structure of the multi-input channel U-Net model is shown in Fig. 6.
The specific coding form of the multi-input channel U-Net model is to first construct a multi-input channel, divide the multisource data into two categories (SAR and optical remote sensing images slope and aspect images) to enter the network in the form of multiple inputs, and then after a scale of convolutionpooling feature extraction, the feature maps of the two types of data will undergo the operation of concatenate, so as to make them merge into a single feature map in the channel dimension to facilitate the subsequent step-by-step encoding of landslide features.The specific operation of the attention mechanism module is to present the feature maps after each feature extraction of the U-Net encoder, which are further feature processed by convolution as the input (x) of the attention mechanism module.At the same time, the feature maps of the same size in the U-Net decoder of the corresponding depth (g) and the input feature maps after feature processing are concatenated and merged in the channel dimension to realize the feature fusion between the encoder and the decoder in U-Net; and then for the fused feature maps after two 1×1 convolution operations, Relu and Sigmoid are selected sequentially as activation functions.The Relu function is used to suppress overfitting in the attention mechanism module, and then the Sigmoid function is operated to form a single-channel spatial attention matrix that suppresses redundant features, and then the weight matrix and the feature map inputted in the first step undergo the Hadamard product operation to form a feature map that incorporates the upsampling results of the decoder and then cocalculates the spatial weights of the attention map to assign higher weights to the landslide area and realize the integration of the landslide area into the decoder and the decoder.Higher weights can be assigned to landslide areas to realize the focus on landslide feature areas.Finally, the output map of the attention module is fused with the upsampling results of the decoder at the corresponding level, and then the detail reconstruction and decoding process of the feature map is continued.This mechanism can not only complete the feature fusion of the corresponding depth of the encoder and decoder in the semantic segmentation process of the U-Net network, but also solve the shortcomings of the fixed spatial weight and the semantic gap between the encoder and decoder feature maps of the traditional U-Net network structure in the corresponding depth feature fusion, so as to improve the performance of the model for landslide feature learning, and then improve the accuracy of landslide detection results.

4) Model Training:
Table IV shows the hardware and operating system configuration used for the experiment and Table V lists the details of the software configuration.After the pre-experimental test, considering the hardware limitations and the computational efficiency of the model, the number of iterations (epoch) in the experimental process is set to 128, the batch size is set to 16, the learning rate is set to 0.0001, and the optimizer is selected Adaptive moment estimation (Adam) [43].The Adam algorithm is an optimization of the gradient descent Adagrad algorithm and the Rmsprop algorithm.At the same time, the Adam algorithm is not affected by the gradient scale transformation when the parameters are updated.It can automatically adjust the learning rate, which is easy to implement and has high computational efficiency [44].This article studies the binary classification of image pixels, so binary cross-entropy is selected as the loss function.It is defined as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where N is the number of categories (N = 2 in this article), y is the binary label 0 or 1, and p(y) is the probability that the output belongs to the y label.
During the training process, the area under curve is selected to monitor the training process of the model.All the deep learning models in this article are implemented in Python programming language and are developed and implemented based on Tensorflow API and Keras library.

5) Evaluation Index:
To quantitatively evaluate the model landslide detection results, this study first calculated the confusion matrix of the landslide detection results (see Fig. 7).Subsequently, based on the confusion matrix, this study calculated four quantitative evaluation indicators, namely precision, recall, F1 score, and mean intersection over union (MIoU), to conduct a more comprehensive quantitative evaluation of the model.
Precision is used to evaluate the proportion of samples predicted by the model as positive cases (landslides), and it is an indicator to measure the accuracy of the model in predicting positive cases.The calculation formula is as follows: Recall is used to evaluate the proportion of samples correctly predicted as positive examples (landslides) by the model to all actual positive examples.It is an indicator that measures the model's ability to cover positive examples.The calculation formula is as follows: F1 score (F1) is the harmonic mean of precision and recall, considering both precision and recall indicators.For imbalanced datasets, F1 can provide a more comprehensive performance evaluation, a more balanced indicator, and better reflect the comprehensive performance of the model.The larger the F1, the better the performance of the model.The calculation formula is as follows: MIoU is a commonly used evaluation metric for semantic segmentation models, used to measure the performance of the model in semantic segmentation tasks.It provides an evaluation of the overall segmentation accuracy by considering the overlap of pixel levels in various categories.The calculation formula is as follows:

III. RESULTS AND ANALYSIS
In this section, an experimental scene is selected to apply and comprehensively evaluate the proposed method.First, the proposed model is trained according to the datasets of different data types in the training area, and then the trained model is used to detect landslides in the test area, to compare and analyze the effect of landslide detection of different types of data from the perspective of visual and quantitative evaluation indexes, and to prove the effectiveness of fusing SAR multisource remote sensing data for landslide detection.In addition, to demonstrate the effectiveness of the multi-input channel U-Net model fusing SAR multisource remote sensing data for landslide detection, as well as to comparatively analyze its landslide detection effect from different perspectives, we compute the TP, FP, and FN diagrams of the landslide detection results of the different network models fusing SAR multisource remote sensing data.The comparative experiments and visualization analysis can comprehensively evaluate the landslide detection performance of the proposed method in the experimental scene.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. Experimental Scene
Bailong River Basin is located in the southeast of Gansu Province, China.It is a tributary of the Jialing River, a tributary of the Yangtze River.It is in the transition zone from the second-level ladder to the third-level ladder in mainland China.It belongs to the transition area from the Qinghai-Tibet Plateau to the Qinba Mountains and the Loess Plateau (see Fig. 8).The climate conditions in the basin are well, the annual average temperature is 2 °C-15 °C, and the annual average precipitation is about 700 mm.The topography and geomorphology in the basin are complex, the alpine gorge landform and the loess landform are distributed alternately, the neotectonic movement is strong, the active faults are developed, and the weak rock and soil bodies (phyllite, etc.) are widely distributed, forming a fragile and sensitive geological environment.At the same time, frequent seismic activities and deep valleys provide favorable conditions for the occurrence of landslide disasters, making landslide disasters frequent [45], [46].Affected by complex factors such as complex terrain, vegetation cover, and weather conditions in the basin, it is difficult to detect landslides only using optical remote sensing data.Therefore, it is necessary to use other types of remote sensing data in combination to make up for the limitations of optical remote sensing data.

B. Analysis of Landslide Detection Results Fusing SAR Multisource Remote Sensing Data
To further validate the usability and effectiveness of the proposed method fusing SAR multisource data for landslide detection, a total of four sets of experiments were designed for two-by-two comparisons in this study, which area as follows: 1) traditional U-Net combined with 12 bands of Sentinel-2A images (U-Net + 12 channels); 2) traditional U-Net combined with SAR multisource remote sensing data (two polarization modes of Sentinel-1A images, 12 bands of Sentinel-2A images, slope, and aspect images, U-Net + 16 channels);

3) multi-input channel U-Net combined with 12 bands of
Sentinel-2A images (multi-input channel U-Net + 12 channels); 4) multi-input channel U-Net combined with SAR multisource remote sensing data (two polarization modes of Sentinel-1A images, 12 bands of Sentinel-2A images, slope, and aspect images, U-Net + 16 channels).In this study, the model was trained based on the training dataset produced in the same training area, and the model was tested in the same testing area.To obtain an overview of the landslide detection results of the different methods, all the subimages of the detection results were combined to form a complete landslide image of the testing area based on their geographic coordinates.Meanwhile, we calculated the four evaluation indexes, recall, precision, F1, and MIoU (see Table VI), respectively, based on the confusion matrix to further quantitatively evaluate the landslide detection results (see Figs. 9 and 10).
Comparing Fig. 9(a), (b) and (c), (d), and Fig. 10(a), (b) and (c), (d), it can be found that whether based on the traditional U-Net model or the multi-input-channel U-Net model, the landslide detection results with the fusion of the SAR multisource remote sensing data are better than the landslide detection results using only the Sentinel-2A optical remote sensing images in terms of overall visual effect, and the overall results of the detection are more refined, with fewer false detections and missed detections.Meanwhile, the landslide detection results with the fusion of SAR multisource remote sensing data are also superior to the landslide detection results using only Sentinel-2A optical remote sensing images for all four quantitative evaluation indexes.Among them, high recall means that the coverage rate of landslide detection is high, that is, more landslides are detected, regardless of how many objects are mistakenly classified as landslides (in Fig. 11, the multi-input channel U-Net model combined with SAR multisource remote sensing data obtained the maximum landslide area, indicating that this method can comprehensively and completely detect landslides).High precision indicates that the object detected as a landslide has a high probability of being a landslide.F1 comprehensively considers the model's detection coverage of landslides and the accuracy of being detected as landslides and provides an overall performance based on recall and precision.MIoU can be used to measure the similarity between the model's prediction results of landslides and nonlandslides at the pixel level and the real labels.In this scene, the number of background pixels representing nonlandslides is much higher than the number of images representing landslides, and the distribution between sample categories is unbalanced.Therefore, MIoU is higher than the three metrics of recall, precision, and F1 in the four sets of experiments.On both traditional U-Net and multi-input channel U-Net models, F1 values of landslide detection results incorporating SAR multisource remote sensing data are higher by about 6.6% and 14.2%, respectively, which comprehensively proves that the incorporation of SAR multisource remote sensing data can deal with the phenomena of false detections and missed detections better in the face of the complex scene.
The comparison between different models shows that, due to the complex geographic environment of Bailong River (complex topography, vegetation cover, and cloud cover.), it is difficult to provide enough landslide feature information for the model to learn based on only optical remote sensing images.The landslide detection method fusing SAR multisource remote sensing data can effectively introduce and utilize the rich and comprehensive landslide feature information between multisource data, which facilitates the model to understand and learn the landslide features from different perspectives, effectively differentiating the landslide from its surrounding similar and complex features, and then improve the landslide detection accuracy.The visualization and quantitative evaluation results demonstrate the usability and effectiveness of fusing SAR multisource remote sensing data for landslide detection.

C. Accuracy Analysis of the Proposed Model
In this study, multi-input channel U-Net is constructed as a deep learning model for landslide detection and combined with multisource remote sensing data such as SAR (Sentinel-1A), optical (Sentinel-2A), and terrain (slope and aspect) for model training and landslide detection.To validate the effectiveness of the landslide detection model in this study and to demonstrate the superiority of its performance, we compare the multi-input channel U-Net landslide detection results with the traditional U-Net landslide detection results for the test area.The results show that both models can detect most of the landslides in the test area.Some areas are randomly selected and enlarged (see Fig. 12).
Comparing the landslide detection results of ground truth with that of multi-input channel U-Net and traditional U-Net, it can Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.be found that the traditional U-Net model can detect relatively complete landslide boundary information, but there are many false detections [see yellow circles in Fig. 12(c)] and missed detections [see red circles in Fig. 12(c)].The false detections mostly occur near known landslide areas, and the scale of the objects that were mistakenly detected as landslides is relatively small, some landslides that are close to each other have misconnected at the boundary.At the same time, the phenomenon of missed detection is more significant on small-scale landslides, and the detection results are not precise enough, indicating that traditional U-Net models have certain shortcomings in landslide feature extraction and learning.Compared with traditional U-Net models, the overall detection performance of the multi-input channel U-Net model is better [see Fig. 12(d)], which is closer to the ground truth.The landslide detection results are more complete, and there are fewer false detections and missed detections.This effectively improves the recognition ability of landslide and nonlandslide feature differences, reflecting the comprehensive ability of the multi-input channel U-Net model in the landslide detection task of this study.Multi-input channel U-Net model can better learn landslide features and achieve more accurate landslide detection.To further analyze the accuracy of the landslide detection results of the proposed model, we calculate the TP, FP, and FN of the landslide detection results of the traditional U-Net and multi-input channel U-Net models based on ground truth and landslide detection results, and visualize them (see Fig. 13).The accuracy of the detection results is compared from the overall large-scale area.As shown in Fig. 13, compared to traditional U-Net [see Fig. 13   Overall, this study constructs a multi-input channel U-Net model that can effectively introduce SAR multisource remote sensing data by adding multiple input channels.The attention mechanism is introduced between the U-Net jump connections to adjust the spatial weights of the feature maps, so that the model can better deal with the complex background information and focus on and learn the landslide features.While completing the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.U-Net network feature fusion, the deficiency of U-Net in corresponding to the depth feature fusion with fixed spatial weights and the semantic gap between the encoder and decoder feature maps is solved, and the landslide and nonlandslide features can be better distinguished.The multi-input channel U-Net model combined with SAR, optical, and terrain factor data achieves better landslide detection.

A. Model Performance Analysis
Gradient-weighted class activation mapping (Grad-CAM) is a deep network visualization method based on gradient localization, which helps explain the decision basis of the model and identify important features by generating heat maps to visualize the regions of interest of the model.Grad-CAM can help users better understand and validate the behavior and performance of the models [47], [48].Before training the multi-input channel U-Net model, a total of 16 inputs were constructed by abstracting all types of data into 16 bands of images, and all types of data were jointly input into the model for training.To enhance the interpretability of the proposed method, the heat maps of each input band image were generated separately using the Grad-CAM algorithm.Then superimposed it on the corresponding band image to display, visualized the region of interest of the multi-input channel U-Net model on each band image, and explained the decision basis of the model on each band image.Fig. 15 shows the corresponding results.
In Fig. 15, the red region is the part that the model pays more attention to.The darker color indicates that the model pays more attention to this region, which is considered to contribute more to the landslide pixel classification results.This indicates that the features in this region play an important role in the landslide detection task.On the contrary, the blue region is the part that the model pays less attention to.The lighter color indicates that the model pays less attention to this region and considers that the region contributes less to the landslide pixel classification results.This indicates that the features in this region play a minor role in the landslide detection task [49].Based on the observation of the heat map, it can be found that each band of the image makes a certain contribution in landslide detection.
This study fuses SAR multisource remote sensing data for landslide detection.Multisource remote sensing data contain different types of rich and diverse information due to the differences in data sources, physical characteristics, resolution, and sensor parameters.In particular, SAR can penetrate vegetation and clouds to obtain subsurface information, which is valuable for landslide identification because some landslides will be covered under vegetation and clouds [50], [51], [52].Meanwhile, SAR has a strong ability to perceive changes in the surface and surface structure, and landslides usually lead to changes in surface morphology and destruction of surface structure [53].These phenomena can be captured in SAR images, thus contributing to the identification and monitoring of landslides.This study used SAR (Sentinel-1A) data with two polarization methods, VH and VV.VH polarization emits electromagnetic waves in the vertical direction and receives electromagnetic waves in the horizontal direction.VV polarization is the transmission and reception of electromagnetic waves in a vertical direction [54].The echo intensity of the same object varies under different polarization modes, thus increasing the information of landslide targets.The common use of SAR data with different polarization methods can provide rich polarization information from multiple perspectives.This polarization information can be used to extract more details of surface features, enhance landslide recognition ability, reduce false and missed detection rates, and obtain better visual effects and evaluation indicators [see Figs. 9 and 10].From Fig. 15, the model's attention to landslide features is relatively rough in some bands of optical images, while it is more detailed in SAR images [see Fig. 15(a),  (c), and (h)].In Fig. 15(h), the model generates more erroneous attention (Band4, Band8) in some bands of optical images, while the attention on SAR images is more accurate and clearer.Slope and aspect data provide terrain information, which can supplement missing or difficult-to-observe terrain features in optical images.Landslides are usually related to the steepness and aspect of the terrain.Increasing slope and aspect data can provide a more comprehensive and clear understanding of terrain changes, thereby better helping deep learning models capture the terrain features of landslide areas [see Fig.In this study, we construct a multi-input channel U-Net model for landslide detection by introducing the multi-input channel and attention mechanism module, which realizes the effective input and processing for multissource data.The model has strong information processing and feature learning ability, and can learn the information related to landslide features in different types of data by means of adaptive weighting.The introduced attention mechanism module (see Fig. 5) can solve the problem of spatial weight fixation of traditional U-Net when corresponding to deep feature fusion.It can effectively capture the important landslide-related features in the input multisource data, automatically learn and focus on the key areas (darker color area, red area in Fig. 15), and enhance the learning ability of landslide features.Ultimately, it improves the ability to accurately identify and localize landslide areas [42], [55].By fusing information from different data sources, the multi-input channel U-Net can obtain landslide feature information from multiple viewpoints and data sources.This information is then processed effectively to improve the accuracy and robustness of landslide detection, thus enhancing the performance of landslide detection.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In addition, from the above Section III-B, the landslide detection accuracy can be improved to a certain extent by optimizing either the network structure or the dataset.It is worth noting that the landslide detection results of traditional U-Net combined with SAR multisource remote sensing data are better than those of multi-input channel U-Net combined with optical images in all four evaluation indexes.The multi-input channel U-Net realizes the effective input of multisource remote sensing data and focuses on landslide features by introducing multiple input channels and adding the attention mechanism.The SAR multisource remote sensing data can provide rich information to help the model better understand and learn landslide features.Therefore, it can be inferred that the effect of fusing multisource data is better than the effect of changing the network structure of the model in this research experiment.This suggests that the proposed method of enriching data information by introducing multisource data to improve model performance is feasible in landslide detection.

B. Prospect
This study utilizes SAR multisource remote sensing data combined with the multi-input channel U-Net model for landslide detection and applies it to the Bailong River Basin.The multiinput channel U-Net model can effectively input the SAR multisource remote sensing data and better learn the rich landslide feature information provided by these data, and the detection results proved its validity, but there still existed some errors and omissions (see Figs. 9 and 10).This is because landslides are geological phenomena with different spatial scales, and their scale varies [56], [57].The spatial resolution of the data used in this study is limited, so some smaller landslides may be overlooked, resulting in missed detection.Meanwhile, deep learning methods are data-driven, which requires many training data [58], [59], [60], and the number of landslide samples used for training in this study is relatively small.Therefore, we will consider using higher spatial resolution remote sensing data for landslide detection in future work to improve the accuracy of landslide detection.In addition, in future application scenarios, facing the problem of insufficient data samples, how to effectively expand data samples or accurately detect landslides based on current samples is also a problem that needs to be considered in future work.

V. CONCLUSION
A multi-input channel U-Net landslide detection method fusing SAR multisource remote sensing data is proposed and utilized for landslide detection.SAR multisource remote sensing data can provide rich information.The multi-input channel U-Net model can input the data effectively and process the most valuable and noteworthy parts of the multisource data.Then the landslide-related parts are associated with higher weights to improve the model's ability to perceive and recognize the key features of landslides.This improves the model's ability to learn landslide features.Mutually independent training and testing regions are selected to apply the multi-input channel U-Net method fusing SAR multisource remote sensing data for landslide detection experiments, and compared with the traditional U-Net with SAR multisource remote sensing data and the traditional U-Net without regard to SAR multisource remote sensing data for landslide detection methods.The proposed multi-input channel U-Net landslide detection method fusing SAR multisource remote sensing data can comprehensively utilize the rich information provided by different types of multisource remote sensing data, with a focus on learning landslide features.As a result, better performance was achieved, with higher landslide detection accuracy and lower false and missed detection rates.The quantitative results show that each evaluation index of the multi-input channel landslide detection method fusing SAR multisource remote sensing data proposed in this study is optimal, and the highest F1 value (66.18%) is obtained, which verifies the effectiveness of the proposed method.The proposed method can provide technical support for landslide disaster assessment.

Fig. 2 .
Fig. 2. Flowchart of a multi-input channel U-Net landslide detection method fusing SAR multisource remote sensing data.

Fig. 9 .
Fig. 9. Landslide detection results of traditional U-Net fusion of different data in the test area.(a) Landslide detection results from traditional U-Net fused optical remote sensing imagery.(b) Specific area in a. (c) Landslide detection results from traditional U-Net fused SAR multisource remote sensing data.(d) Specific area in c.

Fig. 10 .
Fig. 10.Landslide detection results of multi-input U-Net fusion of different data in the test area.(a) Landslide detection results from multi-input channel U-Net fused optical remote sensing imagery.(b) Specific area in a (c) Landslide detection results from multi-input channel U-Net fused SAR multisource remote sensing data.(d) Specific area in c.
(a) and (b)], multi-input channel U-Net has more TP pixels [see Fig.

13
(c) and (d) blue area, the model predicted landslide, actual landslide], fewer FP pixels [see Fig. 13(c) and (d) red area, the model predicted landslide, actual landslide], and fewer FN pixels [see Fig.13(c) and (d) yellow area, the model predicted non-landslide, actual landslide], this phenomenon indicates that the multi-input channel U-Net model can better balance precision and recall, indicating that the multiinput channel U-Net model has achieved more accurate detection of landslides, with fewer false detections and missed detections.This proves the effectiveness of the multi-input channel U-Net model in the research method of this article.In addition, we also randomly select some subregions to plot TP, FP, and FN results and enlarge them (see Fig. 14) for visual comparison of accuracy at different scales.The results also demonstrate the superiority of the multi-input channel U-Net model for landslide detection (compared to traditional U-Net, the multi-input channel U-Net landslide detection results have more TP, less FP, and FN).

Fig. 12 .
Fig. 12. Partial landslide detection results of different models fusing SAR multisource remote sensing data in the test area.(a) Optical (Sentinel-2A) image.(b) Ground truth.(c) U-Net results.(d) Multi-input channel U-Net results.

Fig. 13 .
Fig. 13.Visualization of TP, FP, and FN for landslide detection results.(a) U-Net results.(b) Specific area in a. (c) Multi-input channel U-Net results.(d) Specific area in c.

15
(e), (g), and (h)].Therefore, fusing SAR multisource remote sensing data can provide rich and complementary information to help the model perceive landslides from different perspectives, analyze the surface situation more comprehensively, and capture more comprehensive and diversified landslide features, thus improving the performance of landslide detection.

Fig. 15 .
Fig. 15.Heat maps of landslide detection by multi-input channel U-Net combined with SAR multisource remote sensing data.