The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

The scientific outcomes of the 2022 Landslide4Sense (L4S) competition organized by the Institute of Advanced Research in Artificial Intelligence (IARAI) are presented here. The objective of the competition is to automatically detect landslides based on large-scale multiple sources of satellite imagery collected globally. The 2022 L4S aims to foster interdisciplinary research on recent developments in deep learning (DL) models for the semantic segmentation task using satellite imagery. In the past few years, DL-based models have achieved performance that meets expectations on image interpretation, due to the development of convolutional neural networks (CNNs). The main objective of this article is to present the details and the best-performing algorithms featured in this competition. The winning solutions are elaborated with state-of-the-art models like the Swin Transformer, SegFormer, and U-Net. Advanced machine learning techniques and strategies such as hard example mining, self-training, and mix-up data augmentation are also considered. Moreover, we describe the L4S benchmark data set in order to facilitate further comparisons, and report the results of the accuracy assessment online. The data is accessible on \textit{Future Development Leaderboard} for future evaluation at \url{https://www.iarai.ac.at/landslide4sense/challenge/}, and researchers are invited to submit more prediction results, evaluate the accuracy of their methods, compare them with those of other users, and, ideally, improve the landslide detection results reported in this article.


I. INTRODUCTION
L ANDSLIDES are a frequent natural hazard observed in mountainous terrains across the globe [1].There are several mechanisms by which soil, rock, and objects located on the ground or underground on an unstable hill slope can move downward and create a landslide [2].Landslides mainly occur in response to natural processes like heavy rainfalls and earthquakes or human-induced activities [3].The downward movement of the most catastrophic landslides is fast.They can travel large distances and take down everything in their path, creating scars on higher slopes and accumulating to deposition in valleys [4].Landslides in mountainous areas are a problem, responsible for substantial losses, including damage to buildings and infrastructure and even fatalities [5].The current climate changes, population growth, and rapid urbanization in areas vulnerable to natural hazards have also increased the occurrence of landslides and their consequences [6].As a result, in recent years, a considerable amount of attention has been paid to gaining a better understanding of the mechanisms of these catastrophic hazards [7].The most vital information regarding these catastrophic events is the awareness of past movements and their exact locations and extensions, ideally recorded in a landslide inventory data set [5].Such a data set is an essential requirement for extracting advanced information, developing knowledge in the field, and predicting the unstable slopes that are prone to landslides [8]- [10].Prediction maps generated from such a data set can be used for potential mitigation measures for the region under the study [11].Therefore, a more accurate and detailed landslide inventory data set is a prerequisite for a precise disaster mitigation action [12].
In the past decade, deep learning (DL) has gained a great deal of attention, both in computer vision and remote sensing (RS) image analyses.The application of deep learning and convolutional neural networks (CNNs) to the detection of landslides emerged in early 2019, primarily using very high resolution (VHR) [14] and hyperspectral RS data [15].The prospect of generating landslide maps with more accuracy than can be achieved with traditional methods such as semiautomated [16] and machine learning classifiers [17] has encouraged researchers in this field to develop and apply more sophisticated DL algorithms.To the best of our knowledge, no DL algorithm has been designed specifically for the distinct characteristics of landslide detection.Therefore the application of existing DL models and their variations for this task poses some new concerns, namely their transferability to new geographical areas with different landcovers and morphologies and the lack of any comprehensive open-source benchmark data set [12].The artificial intelligence for remote sensing (AI4RS) group of the Institute of Advanced Research in Artificial Intelligence (IARAI) is a small international group of scientists working on the development and application of state-of-the-art deep learning (DL) solutions and algorithms for satellite imagery interpretation.This group has organized the Landslide4Sense (L4S) competition to foster ideas and progress in DL algorithms for the specific Earth observation application of landslide detection.The competition provides participants with a landslide benchmark data set with globally distributed multisource satellite imagery.The benchmark data set is prepared and introduced as an explicit norm for evaluating alternative DL approaches.The training set, which is a subset of the whole benchmark data set is released and thoroughly described by Ghorbanzadeh et al. in [6].The study evaluates this subset of the benchmark data set using 11 different state-of-the-art DL segmentation models.
The L4S competition fosters interdisciplinary research in computer vision, artificial intelligence (AI), and RS image analysis for image classification and landslide detection.The global objective is to build DL-based models for understanding the differentiating characteristics of landslides based on the provided optical, digital elevation model (DEM), and slope layers from freely available satellite imagery acquired by Sentinel-2 sensors and ALOS PALSAR.During the L4S competition, along with the highest accuracy assessment results, a special prize was also awarded for the most creative and The competition is organized by IARAI and aims to improve automatic landslide detection DL algorithms using multisource satellite imagery.In this competition, the main objective is the creation of landslide inventory maps using only the specified labeled landslide benchmark data set as training data.
The main focus of this article is on the scientific outcomes of the L4S competition.The rest of the paper is organized as follows.Section II describes the L4S benchmark data set used in the competition.Section III provides statistics of submissions and the overall results of the competition.In the next four sections, we discuss the DL algorithms proposed by the first-to third-ranked teams and the team of the special prize.Finally, we summarize our concluding points in Section VIII.

II. THE DATA AND BASELINE OF LANDSLIDE4SENSE COMPETITION 2022 A. Data Set
The benchmark data set for the L4S competition comprises 14 layers of data: multi-spectral data from Sentinel-2 (band1-band12), digital elevation model (DEM), and slope data from ALOS PALSAR.All 14 layers in the landslide benchmark data set are resized to the resolution of about 10m per pixel and are labeled pixel-wise to landslide and non-landslide classes.The landslide benchmark data consists of the training, validation, and test sets that encompass events occurring across a wide range of geographical locations throughout the world's mountainous regions.Specifically, only the training subset is acquired from four different sites: the Iburi-Tobu area of Hokkaido, the Kodagu district of Karnataka, the Rasuwa district of Bagmati, and western Taitung County.The data collected from these four sites provide 3,799 image patches with a size of 128×128 pixels (see Fig. 1).The validation and test sets contain 245 and 800 image patches of the same size, respectively, which were acquired from other geographical sites.Details about the 14 layers of the landslide benchmark data set are given below.
• ALOS PALSAR.The ALOS phased array type l-band synthetic aperture radar layers have a spatial resolution of 12.5 m and were acquired from 2006 to 2019.The Alaska satellite facility (ASF) is one of the distributed active archive centers that provides high-resolution DEM from ALOS PALSAR at no cost to the user.The slope layer is derived from the DEM ALOS PALSAR, and both DEM and slope layers are converted to 10m spatial resolution.More details about the landslide benchmark data set can be found in [6].
The task of the L4S competition is to predict landslides from the data set provided.The labels are only provided for 3,799 image patches of the training data set.The landslide detection results are evaluated with the pixel-wise F1 Score on the landslide category in both the validation and test phases.Rankings for the competition were determined using only this accuracy assessment metric.However, competitors also received precision and recall metrics during the validation phase to get more meaningful feedback for their landslide detection results.

B. Baseline
We provided a simple baseline in our public GitHub repository prior to the start of the L4S competition. 1 A state-ofthe-art DL model for semantic segmentation was implemented in PyTorch in order to provide this service.This model contains a user-configurable training script for U-Net [18] and the data loader for reading the training and test data sets.U-Net was first applied to biomedical image segmentation, followed by numerous semantic segmentation applications that demonstrated successful results.This model is also common for the landslide detection task and has been applied in a number of studies [12], [19], [20].U-Net comprises an encoder route capable of capturing low-level representations and a decoder route designed to capture high-level representations.As the decoder route is asymmetrical, where the vanished content of the localization is restored by using an asymmetrical design, the encoder route follows a standard CNN design assembled from consecutive convolution blocks.There is a max-pooling layer with a filter size of 2 × 2 and a stride of 2, after two convolutional layers with a filter size of 3 × 3, leveraging the rectified linear unit (ReLU) activation function in each block [18].The baseline model implemented in the L4S competition includes 23 convolutional layers, of which 4 are convolutional-transpose layers.The baseline U-Net model is trained using the training data set and tested on 245 and 800 image patches of validation and test data sets, respectively.The resulting baseline accuracy for the validation and test data sets is represented in Table I.We used all 14 bands for training and testing, and no additional measurements were applied (e.g., data augmentation, pre-or post-processing.Adding any external auxiliary data such as very high-resolution images was forbidden, as specified in the L4S competition terms and conditions.The best performance of the baseline model achieves an F1 score of 59.92% on the test set.

III. SUBMISSIONS AND RESULTS
There were 439 unique users within 85 teams that submitted 7775 landslide detection results to the validation phase of the L4S competition website. 2 The number of total submissions to the test leaderboard was 219 landslide detection results.We limited submissions per team to ten for the test phase.The final ranking was determined based on the highest F1 Score of each team during the test phase.Moreover, a special prize was also considered for the most creative and innovative solution in landslide detection, in the view of the L4S scientific committee.The competitors were from 37 different countries or regions.Most of the competitors were from mainland China, with 134 unique users, followed by 62 from Hong Kong, 50 from the USA, and 42 from Germany.Fig. 2 shows the distribution and the approximate number of unique users per country or region.
The first three ranked teams that were selected based on their highest F1 Scores and the team recipient of the special prize were named winners of the L4S competition and presented their solutions during IJCAI-ECAI 2022, the 31st International Joint Conference on Artificial Intelligence, and the 25th European Conference on Artificial Intelligence at the Workshop on Complex Data Challenges in Earth Observation, CDCEO 2022.

A. Analysis of the Characteristics of Landslide
A progressive label refinement-based distribution adaptation landslide detection framework was proposed by the first-place team for large-scale landslide detection.The unique characteristics of landslides create two particular challenges for largescale landslide detection from remote sensing images: small objects and class imbalance, and distribution inconsistency.
The first challenge, small objects and class imbalance, is shown in Fig. 3.In remote sensing images, the morphology of landslides is very complex, especially with many small branches, which belong to small objects (Fig. 3a).Furthermore, the landslide is not the dominant ground object in largescale remote sensing images, as shown in Fig. 3b, which illustrates the statistical result of the training data set in which the proportion of pixels occupied by the landslide is only 2%, and the number of pixels of other ground objects (background) is 49 times that of the landslide.Both of these challenges, of small objects and class imbalances, lead to lower recall scores.
Distribution inconsistency is another difficult challenge for large-scale landslide detection from remote sensing imagery.In real-world large-scale landslide detection applications, images of landslides to be detected come from all over the world.These images are collected at different times, which leads to different imaging conditions.This spatio-temporal difference leads, in turn, to great differences in radiation values or pixel values of different remote sensing images, especially in the mountains [21], and is characterized by statistical

B. Progressive Label Refinement-based Distribution Adaptation Framework
To address the challenges of large-scale landslide detection, a progressive label refinement-based distribution adaptation framework is proposed by the first-place team for landslide detection.As shown in Fig. 5, the proposed framework includes data preprocessing, model ensemble, model training, model inference, and pseudo-label refinement.
1) Data preprocessing: Scale promotion is used to resist the weak representation caused by small landslide branches; the original images are scaled up from 128 × 128 pixels to 512 × 512 pixels.Random flip, random rotation, and color perturbation also are adopted for data augmentation.Color perturbation is only used for multi-spectral data, not DEM and slope data.
Separated normalization is proposed to alleviate the distribution inconsistency challenge in the data preprocessing stage, which uses the mean and variance from different domains to normalize the data.For example, two different domains are the training domain from the training data set and the validation domain from the validation data set in the model validation stage.The mean and standard deviation are calculated from the two datasets respectively, and then the data in the two domains are normalized respectively.Separated normalization is similar to the normalization for cross-sensor transfer learning [22], but the operation of domain-specific statistics is performed in the data preprocessing stage.
2) Model ensemble and training: In the segmentation model, three models are used to integrate the final landslide detection results.The U-Decoder architecture considering multiple scales is selected as the decoder to further alleviate the small object problem, and Swin Transformer [23] and EfficientNetV2 [24] are selected as encoders to capture complex features of the landslide.This framework also uses SegFormer [25], which utilizes self-attention operations to fit the variant shapes of landslides, and the MLP, which is used to enhance the difficult sample features.To further increase the generalization of the model, the batch normalization in the three segmentation models is replaced by cross-sensor normalization [22] to encode the statistical consistency between the training data set and the validation (testing) data set.
As for model training, Lovasz loss [26] and an online hard example mining strategy are used to address the problem of class imbalance, and soft cross-entropy loss [27] is used to solve the problem of noisy labels in the pseudo labels.
3) Model inference and pseudo label refinement: The probability values output by the above three models are averaged as the final prediction results in the inference stage.
To further alleviate the distribution inconsistency problem, the validation (testing) data set is used in the training process, and progressive pseudo-label refinement is proposed to generate pseudo labels for validation or testing images.Based on the prediction of the ith round, pseudo labels of the (i + 1)th round can be generated using a probability threshold of 0.7.The models of the (i + 1)th round can be trained by training data set and validation (testing) images with pseudo labels.The domain adaptive consistency training and the generation of pseudo labels are performed iteratively, and the pseudo labels are refined progressively.

C. Experimental Results
We conducted a series of experiments to evaluate our proposed method on the Landslide4Sense data set.All bands in the multi-source images were used as inputs during training and testing.
The results in Fig. 6 show that each proposed module improves landslide detection accuracy in the different aspects.In particular, the separate normalization achieves the greatest improvement, addressing the distribution inconsistencies in multi-source data.As the number of label refinements increase, overall performance improves.After the final round, we combine these advanced models into an ensemble to obtain the highest F1 score of 80.41%.
As for the test leaderboard, the best model in validation experiments is utilized as the baseline and achieves an F1 score of 73.07%(Fig. 7).Consistent with the validation phase, 19

Test Leaderboard Results
 Results on the test leaderboard (F1-Score×100) • The F1 score of the best model in the validation leaderboard is 73.07 • The performances of the model are progressively improved as the round increases detection accuracy is progressively improved as the number of rounds increase.Finally, the best model obtains the highest F1 score of 74.54%.

V. SECOND-PLACE TEAM
The network structure we propose for the Landslide4Sense competition is shown in Fig. 8.The details of each component of this proposed structure are discussed in detail below.

A. Framework Introduction
The main framework of our model is the encoder-decoder network, which uses an U-Net-like [18] skip connection structure and can better integrate shallow and deep features.Influenced by the rapid development of transformer-based models in the field of computer vision [23], [25], we introduce Swin Transformer [23] as the encoder part in this structure.To enable the Swin Transformer to reasonably capture the associations between landslide regions on multi-spectral data, we performed spectral selection experiments to use the spectra suitable for the self-attention mechanism.Subsequently, in order to alleviate the imbalance problem of positive and negative samples in landslide detection, we design an unbalanced training strategy that utilizes the unbalanced loss to first train compact feature representations, and then use the feature representations to fine-tune the classifier.Finally, we adopt a self-training strategy to further enhance the generalization of the model in the test domain.
1) Spectral selection: The Vision Transformer-based model performs feature aggregations using the self-attention mechanism to capture relations among pixels [23].If irrelevant spectral information occupies dominant information, it will degrade the performance of the model.However, in the multispectral data, the responses of different spectra to the landslide  Note: In this table, the RGB denotes the red, green, and blue spectra.SWIR denotes the 3-band far infrared in Sentinel-2.NGB denotes the near-infrared, green, and blue spectra.NIR denotes the near-infrared spectral.PCA refers to the techniques [28] of dimensionality reduction for compressing the original 14 bands into 3 bands.
area are quite different, and some spectra are even insensitive to landslides.Therefore, these "unspecific" spectra interfere with the execution of the self-attention mechanism.We performed spectral selection experiments, as shown in Table II and find that the fully convolutional model U-Net performs better with more spectral inputs, while the Transformer model works better when only the RGB spectrum is input.We further visualized the negative effect on self-attention when a spectrum insensitive to landslide responses was fed into the model, as shown in Fig. 9. Finally, we use the RGB spectra as the input to the model.

2) Balanced training:
We design a two-stage training method to reduce the impact of the imbalance in the proportion of positive and negative samples.In the first stage, both the encoder and the decoder are trained simultaneously.For any input samples x i ∈ R w×h×3 , we use weighted cross-entropy loss L wce and Lovasz loss L lov [26] for balanced training as follows: arg min The L ice loss is the image-level loss performed in high-level semantic features in the encoder to assist training, which is defined as follows: where δ is a pointer function.If there is a positive sample (landslide) in y, the value of δ is 1; otherwise, its value is 0. M P (•) is a fully connected layer with a global pooling operation.X denotes the total data set.Optimizing the L ice loss can increase the model's attention to landslides, since the task of finding a landslide in an image is much easier than finding where the landslide is.In order to reweight the learning of negative and positive samples, the L wce loss is defined as follows: where N neg denotes the number of negative samples (nonlandslides) and N pos denotes the number of positive samples (landslides) in any input image x.
As mentioned in [29], this re-weighting loss L wce plays a positive role in balancing the feature distribution of positive and negative samples.However, the classifier will still be  biased.Therefore, in the second stage, we fix the trained encoder E and use the standard cross-entropy loss L ce to train the decoder D: Once we have balanced feature representations, they can be further exploited to de-bias the classifier.
3) Test data self-training: Remote sensing imaging often faces the problem of data distribution shifts due to differences in geography and sampling time.To fully adapt the model to the distribution of the test data, we adopt a self-training strategy [30] for enhancing the generalization of the model.We sort the output probabilities predicted in the previous stage, select the top λ% high-confidence pixel-level pseudo-labels, and add them to the training data for self-training.

B. Experimental Results
In this subsection, report the performance of the balanced training and self-training methods.Table III shows that the two-stage balanced training method better attenuates the influence of the imbalance problem than focal loss [31] and other common methods [26].Table IV shows that the proposed self-training method can enhance the performance of the model, and can also balance precision and recall by adjusting the value of λ.
In summary, the transformer-based solution we use can effectively detect landslide areas in multi-spectral remote sensing scenes.In the future, our team argues that adaptive spectral selection or fusion technology is a necessary way to explore the performance of this transformer model further, and will become a follow-up research focus of our team.

VI. THIRD-PLACE TEAM
The solution of the third-place team is illustrated in Fig. 10.The methodology is detailed in the following sections.

A. Problem Formulation
Technically, the landslide detection problem can be formulated as a binary semantic segmentation problem.The training, validation, and test data set can be denoted by D train = {x tr , y tr }, D val = {x val }, and D test = {x te }, where x tr , y tr , x val , and x te ∈ R H×W correspond to the training patch, training label, validation patch, and test patch, respectively.Here H and W denote the data sets' spatial size.The goal of the landslide detection task is to train a semantic segmentation model on D train and D val , so that the best performance can be achieved on D test .Since the data are collected from different regions across the world, improving the exploitation of the unlabeled validation data can be beneficial to mitigate the domain gap between all the labeled and unlabeled data.To this end, we propose to incorporate a mixed supervised loss L mix sup and a mixed pseudolabel loss L mix pse to train the network: The detailed formulation of L mix sup and L mix pse will be given in Section VI-D.

B. Supervised Losses
A combination of the cross-entropy loss L cet and the Jaccard loss L jac [32] are used as the supervised losses: Here M s (•) denotes the mapping function defined by the student model M s .

C. Self-training
The authors propose a self-training strategy [30] to exploit the unlabeled data.First, the teacher model M t will be trained solely on the training data D train .Then it will be used to generate pseudo labels on the unlabeled data D val to supervise the student model M S .
However, the raw predictions from M t are likely to be incorrect.To prevent the student model from overfitting to those wrong predictions, a pseudo-label selection strategy is needed to filter out misclassified pixels.
To achieve this, the Monte Carlo dropout strategy [33] is first used to generate an uncertainty map for each unlabeled image patch.More specifically, the unlabeled validation patch x val is input to the teacher model M t for 10 different runs.During each run, a dropout layer with 0.3 dropping rate is applied after the first convolution layer to disturb the network.The variances of 10 different outputs are then calculated as the uncertainty map.
Next, the uncertainty map is used to mask out those uncertain predictions from the teacher model M t .Inspired by classbalanced self-training (CBST) [30], the selection process is conducted in a class-wise manner, which means the top 90% of the background pixels and top 70% of the landslide pixels with the lowest uncertainty will be selected as the pseudo labels.Meanwhile, the other predictions with higher uncertainty will be ignored when calculating the losses.To this end, the pseudo label loss L pse can be formulated by: Here ŷte corresponds to the pseudo labels of x te generated by the teacher model and followed by the pseudo label selection process.

D. Mix-up Strategy
To prevent overfitting and further improve the generalizability of the landslide detection model, a mix-up strategy [34] is applied to both the labeled and the unlabeled data.Given a batch of the training data x tr and the validation data x val , the mixed data can be achieved by their linear mixing: Here x i and x j are two image patches from the corresponding data set, and λ is the mixing coefficient randomly sampled from a beta distribution during each training step.After applying the mix-up strategy, the supervised and pseudo label losses can be reformulated as By training on mixed images, the model will be less likely to be overconfident about its predictions, and hence better generalize to the unseen data.

E. Post-processing
The dense conditional random field (DenseCRF) [35] technique is applied to the model's output as post-processing.This step helps to better match the predicted landslide contours with the ground truths.Finally, the best model obtains the highest F1 score of 73.5%.

VII. SPECIAL PRIZE TEAM
Landslide4Sense provides data with 14 bands, while most deep learning semantic segmentation models, such as [18], [36]- [38], require an RGB image as the input.This means we cannot utilize pre-trained weights to improve the model performance and shorten training time.On the Landslide4Sense data set, we try three types of models, U-Net, Deeplabv3, and Deeplabv3+, but none of these models yields a very high performance, with F1 scores of only 65%, 66%, and 67%, respectively.So, we explore the use of multi-spectral satellite imagery for the deep learning-based landslide segmentation task.

A. Multi-spectral U-Net
Considering the different resolution bands of the imagery in the Landslide4Sense dataset, we introduce a novel model called Multi-spectral U-Net, which has two input branches for the different resolution inputs.The model structure is illustrated in Fig. 11.Multi-spectral U-Net comprises two branches, the High Resolution Branch (upper part) and the General Resolution Branch (lower part), whose features will Fig.11.The structure of the Multi-spectral U-Net.The inputs of the upper branch consist of band2, band3, band4, and band8, which have 10 meters resolution, and the inputs of the lower branch mix all the 14 bands, including 10, 20, and 60 meters resolutions.Specifically, we implement the downsampling layer by using the convolution with stride two and using the bilinear resize for the upsampling layer, and all the skip connection operations are additive.CBA means the sequential block of convolution, batch norm, and activation.be merged, and then contribute jointly to the final segmentation prediction.
The High Resolution Branch was used for the data with high resolution, which can yield refined feature maps containing more marginal information.Specifically, we implement this branch by using the Inverted Residuals and Linear Bottlenecks introduced in the MobileNetV2 [39], and consisting of two point-wise convolution layers and one depth-wise convolution layer.To avoid a dramatic increase in the dimensions of the feature maps, from 4 to 128 dimensions, we first apply two simple convolution layers.The feature dimensions will expand then recovered to the original dimension after the depth-wise convolution layer.Additionally, there is no downsampling layer in the branch, as the only aim of this branch is to extract additional marginal information in order to get a better segmentation prediction.
In the General Resolution Branch, apply some modifications to the original U-Net [18].U-Net is an expandable segmentation model that has a symmetrical architecture; this kind of architecture has been widely used for other segmentation tasks.It is very convenient to replace some implementations of the U-Net, which is the main reason we chose it for our model.The specific modifications are as follows.First, we reduce the number of the downsampling layers due to the limited size (128 × 128 pixels) of the input image.To ensure the smallest feature size is at least 16 × 16, we use only three downsampling operations in the U-Net model.Secondly, skipconnection introduced in the ResNet is widely used in the model to mitigate the vanishing gradient problem.Finally, we update the activation function to SMU [40], which can improve model performance without performance loss on inference speed, as shown in Eqs.(10), and (11).
In the Multi-spectral U-Net model, we input all 14 bands to the General Resolution Branch and only 10 meters resolution bands (band2, band3, band4, and band8) to the High Resolution Branch.In order to balance the feature dimensions of two branches, we make the High Resolution Branch and the General Resolution Branch have the same output shape, 128 × 128 × 128.The features from two branches will be concatenated to a feature map in the shape of 128×128×256, which is used for the final pixel-level prediction.

B. Experiments
We trained the model with NVIDIA GeForce RTX 3090 GPU and Intel(R) Core(TM) i7-7800X CPU @3.50GHz.To compare the performance of the three models more clearly, we use a batch size of 8, the Adam optimizer, warmup, and restarted cosine learning rate (shown in Fig. 12) and the crossentropy loss.
We split the official training dataset into two parts, with 3539 images for training and 260 images for testing.Then, we compared the Multi-spectral U-Net performance with Deeplabv3+ and U-Net on it, after each model is trained with 200 epochs.In terms of recall and F1 score, Multi-spectral U-Net is significantly higher than the other two models, but its precision is lower than that of U-Net.Significantly, the precision of Deeplabv3+ is dramatically lower than Multispectral U-Net and U-Net, and we think the potential reason is that a large number of downsampling layers lead to the loss of marginal information.
For a better understanding of the different models' performance on the validation dataset, we analyze the prediction segmentation results and choose three representative examples in Fig. 13.In the first example, the landslide segmentation results of the three models are similar, and it is clear that the Deeplabv3+ tends to predict a wide range of area but does not have very refined edge information about the landslide.In other words, this may be the reason that the Deeplabv3+ has a higher recall than U-Net, but the precision is significantly lower.We can not directly see the landslide from the image in the second example; however, all three models can predict the landslide very well, which means all the bands besides the RGB bands (band2-band4) also contribute to the final prediction.The third example is a very complex landslide scenario, in which we can clearly view the superior performance of the Multi-spectral U-Net.
In the test phase of the LandSlide4Sense competition, we use the well-trained model to predict the validation data set first and get annotations from the prediction result.Then, we get a new training data set by combining the annotated validation data set and the old training data set.Finally, after training the Multi-spectral U-Net on the new training data set, we get an F1 score of 71.29% in the test set.

VIII. CONCLUSIONS
In recent decades, remote sensing techniques have been predominantly used for natural hazard-related applications, i.e., landslide detection.There are many advantages to using Earth observation and remote sensing products in these applications, but the most critical one is their timeliness and objectivity.Early detection is vital for a rapid response and effective management of the consequences of a landslide event.Due to the increasing number and quality of space-borne sensors, the remote sensing community has recently had access to high-quality images with a higher spatial-temporal resolution.In light of the improved availability of data, attention has turned towards the methodologies for retrieving information and knowledge from the data itself [41].Therefore, there has been a great desire to replace the use of experts' knowledgebased physical methodologies with automatic interpretation methods of remote sensing images.
Although promising results have been obtained by DL models for a wide range of remote sensing applications, the need for solutions to landslide detection challenges such as extracting landslides from remote sensing data has only been brought to the attention of the machine learning and computer vision communities in recent years.The solutions, however, have only been implemented at the local level and have followed a common procedure that includes training the DL model using an annotated data set of landslides covering a relatively small area [12], [42].The local level is taken into consideration for several reasons related to how model generalization handle high-level issues, such as the impacts caused by different triggers, the types of mass movements, and the geology and morphology of the region, as well as the source of inventory data sets and the method in which they were developed.The landslide inventory data sets that are used for training modern DL models are usually created based on manual or knowledgebased physical semi-automated methods.Thus, implementing such methods for semantic annotations and creating inventory data sets at a large scale is generally a tedious and expensive process.In preparing a precise inventory of landslides, an even greater amount of work is required since it involves not only the analysis of one image but also a comparison of two images from the pre-and post-event for each case study area.Therefore, it is very unlikely that landslide inventory data sets with highly accurate annotations can be found on a large scale.As a result of the lack of these data sets, serious concerns about the performance of currently available landslide detection DL solutions are warranted, particularly, when applied directly to a new case study area that has not yet been investigated.To address all the above-mentioned issues, the L4S competition has been organized by the IARAI and provides a globally distributed landslide inventory data set.The competition promotes development and demonstration of innovative algorithms for automatic landslide detection using remote sensing images throughout the world, as well as providing fair and objective comparisons of different DL solutions for automatic landslide detection.
This paper presents a summary of the top winners of the 2022 L4S competition.The competition was dedicated to developing DL solutions for solving unsolved challenges in the detection of landslides using remote sensing images collected from various regions around the world.Different strategies and algorithms were brought to light by our winning teams.The first-ranked team identified three main challenges: a large number of small landslides and the huge class imbalance between landslides and non-landslides, as well as the distribution inconsistency of the landslides in the study areas and, consequently, the image patches.In addressing these challenges, they conducted a series of experiments to obtain the competition's highest F1 score value of 73.07%.For the weak representation of small landslides, they applied a scale promotion of original image patches from 128 × 128 pixels to 512 × 512 pixels.This team integrated three models of Swin Transformer, EfficientNetV2, and SegFormer by emphasizing self-attention operations.Further landslide detection improvements were effected by the second-place team using the Swin Transformer as the encoder part and the selfattention mechanism.In addition, a self-training strategy was used to enhance the generalization of their proposed model on the competition's test data.To overcome the imbalance between landslide and non-landslide classes, the first-place team adopted and applied the Lovasz loss and online hard example mining strategy.An unbalanced approach to training, however, led to the second team's success.The third-place team proposed an integrated approach of a mixed supervised loss and a self-training consisting of pseudo labels and the Monte Carlo dropout strategy to train their network for landslide detection.Using DenseCRF, this team post-processed the network's outputs to improve the borders of landslides.The special prize team introduced a multi-spectral U-Net inspired by MobileNetV2 to handle the multi-spectral Sentinel-2 and ALOS PALSAR data for landslide detection provided by the competition.As part of the competition's test phase, they generated annotations for the validation data set using the welltrained model, and by adding new labeled data to the training data set, they trained their introduced U-Net.The DL solutions provided by the competition's four winners were presented by the corresponding authors at the CDCEO 2022 workshop as a satellite event at IJCAI-ECAI 2022, the 31st International Joint Conference on Artificial Intelligence, and the 25th European Conference on Artificial Intelligence.
The data remain accessible after the L4S competition and the Future Development Leaderboard for future evaluation at https://www.iarai.ac.at/landslide4sense/challenge/ is active to allow further research developments and contributions.In this way, anyone can submit landslide detection results on the test data set, make comparisons of their performance to that of other users, and, ideally, enhance the accuracy presented in this outcome paper.It is noteworthy that L4S was the first competition to be based on multi-source satellite imagery for landslide detection and had a significant impact on this field; furthermore, participants agree that the competition is also an extremely interesting challenge from a computer vision and machine learning perspective.
As the consequences of climate change pose an accelerating quantity and range of challenges to the world's scientists, they may not have sufficient time and resources to generate landslide inventory data sets based on fieldwork.Yet modern DL solutions, particularly those based upon such a large source of remote sensing data, must be able to cope with monitoring natural hazards and risk assessment.Therefore, developing innovative DL solutions and training them on a global data set will be crucial to generating timely information from remote sensing data for future landslide events.The L4S 2022 data provide a valuable benchmark data set for evaluating all new DL algorithms developed for landslide detection, and the algorithms developed as part of the L4S competitions will, it is hoped, inspire development of increasingly efficient and accurate algorithms.

Fig. 1 .
Fig. 1.The locations of the training sites on a global image of landslide susceptibility generated by [13] and the visualization of every image layer in the 128 × 128 window size patches of the landslide training data set.Multi-spectral Sentinel-2 data is represented by bands 1-12, and slope and DEM data is represented by bands 13-14.The patches in the last column refer to the corresponding ground truth polygons.

Fig. 2 .
Fig. 2. Global distribution and number of unique users per country or region, created in https://app.datawrapper.de.

Fig. 3 .
Fig. 3.The problem of small objects and class imbalance.The landslide has some smaller branches and the background has 49 times as many pixels as the landslide.

Fig. 4 .
Fig. 4. The problem of distribution inconsistency.The statistical results are calculated band by band, and are significantly different among the training data set, the validation data set, and the testing data set.

Fig. 7 .
Fig. 7.The experimental results on the test leaderboard.

Fig. 8 .
Fig. 8. Model structure for landslide detection proposed by Seek team.

Fig. 9 .
Fig. 9. Visualization of the feature activation map of the Swin Transformer when inputting different spectral bands.

Fig. 10 .
Fig. 10.Network architecture of the landslide detection method proposed by Tanmlh team.The overall architecture follows a self-training scheme, which consists of a teacher model branch and a student model branch.For the teacher model branch, a teacher model pre-trained on the training data is applied to generate pseudo labels based on the unlabeled images, which will later be used to supervise the training of the student model.For the student model branch, both the labeled and the unlabeled images are input to the student model after some data augmentation and mix-up operations.The training losses are then calculated based on both the training labels and the pseudo labels.During the training phase, the teacher model is fixed.