Integrative Few-Shot Classification and Segmentation for Landslide Detection

There has been an ongoing demand for monitoring landslides due to the heavy economic losses and casualties caused by such natural disasters. In this paper, we introduce a swift landslide detection system that can detect and segment landslides occurring on roads. To tackle the challenges of data collection, we propose an automatic annotation procedure to create a new landslide dataset consisting of 2963 images, termed the LandslidePTIT dataset. Additionally, we construct a novel deep-learning architecture that can perform both classiﬁcation and segmentation tasks well from a few annotated images of landslides. Speciﬁcally, the model consists of four main modules that are delicately designed to solve the few-shot segmentation problem using landslide images, namely hypercorrection construction, attentive squeeze block, a cross-feature layer, and broadcast and squeeze layer. Experimental results exhibit the superiority of the proposed method in comparison with competitive baselines, in terms of both quantitative and qualitative manners.


I. INTRODUCTION
Landslides, typically a consequence of climate change [1] and urban expansion [2], are one of the most common natural disasters today and cause severe troubles to human life and infrastructures all over the world. For example, a severe landslide occurred in mid-2020 in Vietnam, causing dozens of deaths. 1 Landslides cause roads to be blocked, which causes hurdles not only in the traffic flow but also generate various traffic problems in the form of congestion [3]. Therefore, there is a need to detect and warn of landslides as quickly as possible to identify proper counter-measures so that possible unfortunate consequences can be avoided.
Previous studies deal with two related problems of landslides, termed landslide detection and landslide prediction, The associate editor coordinating the review of this manuscript and approving it for publication was Jeon Gwanggil . 1 Source: https://blogtuan.info/2022/05/20/serious-landslide-on-theroute-da-lat-mui-ne/ in which the methods built upon machine learning and deep learning are quite common [4], [5]. However, practical applications of such methods are often limited due to the fact that deep learning models often require enormous labeled data to work well while landslides may only occur several times a year, thus it may take years to collect a sufficient amount of data. Besides, as landslides tend to occur in mountainous areas, collecting landslide images is difficult due to the danger as well as the lack of equipment in such underdeveloped areas. Several prior works tackle this problem by relying on images captured by satellites [6], [7], which are however unable to respond promptly to a real-life landslide event unless we own the satellites.
In our work, we introduce a landslide monitoring system that can detect quickly whether there is any landslide occurring on roads. Fig. 1 illustrates our newly designed system consisting of four layers, namely collection, preprocessing, cloud, and application layers. Specifically, the system deploys specialized drones to collect images on roads in different areas, inspired by several recent novel methods that enables an energy-efficient routing schedule for capturing road images by drones [8], [9]. As one of our main contributions, we propose a data generation procedure, where the generated images are pre-processed and annotated automatically. The generated images are fuel to train a deep learning-based artificial intelligence (AI) model in the cloud layer that can rapidly detect landslides causing damages on roads. After heavy rain stops, the operators deploy the drones to scan the surveillance areas and transfer the images to the AI model. The locations of landslides are then detected and extracted to create a map in the application layer for immediate response of local governors. It is worth noting that even with the pre-processing layer that generates images for training, it is still challenging to train the model well due to two reasons. First, the landslide can occur in any part of the roads, making it impossible for the data generation module to exhaustively generate landslides in every possible position for training. Second, the augmented landslides from generated images may not be generalized in real practice due to different types of landslides. Thus, as another important contribution, we propose a novel detection and segmentation method based on few-shot learning. In particular, the proposed method termed Cross Feature and Attentive Squeeze Network (CF-ASNet), combines the recently state-of-the-art ASNet model with the cross-validation method. By virtue of the transferability between few-shot learning classification and few-shot learning segmentation tasks, the classification and segmentation accuracy are greatly boosted.
Our contributions are four folds and are summarized as follows: • We introduce a new landslide detection system that aims to swiftly identify and measure the damage caused by landslides; • We propose a novel data generation procedure, where labels of landslides are automatically assigned for training; • We design a new model that can generalize to new types of landslides based on few-shot segmentation techniques; • We empirically validate the effectiveness of the proposed data generation procedure as well as the newly designed model. The organization of this paper is as follows. Section II presents relevant previous studies. In Section III, we describe the process of collecting landslide data to create our new dataset for training and testing landslides in two tasks, including detection and classification. In Section IV, we introduce a deep learning model for landslide detection based on a few-shot segmentation approach. Finally, we evaluate the proposed methods in section V.

II. RELATED WORK
Our study is related to three broad categories, namely landslide detection, landslide segmentation and few-shot learning.

A. LANDSLIDE DETECTION 1) MACHINE LEARNING-BASED APPROACHES
Machine learning techniques used in landslide detection can follow both supervised and unsupervised settings. In supervised learning, typical methods employed support vector machine (SVM) [10], [11], k-nearest neighbors (KNN) [11], Logistic Regression (LR),Random Forest (RF) [11], [12] and several other conventional classification techniques like Decision Tree [13], Naive Bayes [14], EML [15]. These studies aimed at finding the relationship between known input and unknown output to classify each image with two labels, so-called ''landslide'' or ''non-landslide''. In unsupervised methods, landslide samples are grouped based on their similarity. In [16], the authors proposed an unsupervised method by utilizing six unsupervised well-known methods, including K-means, K-medoids, hierarchical cluster (HC) analysis, expectation-maximization using Gaussian mixture models (EM/GMM), affinity propagation, and mini-batch K-means, to find cluster pattern of landslides, which then acts as training data for the landslide detection problem.

2) DEEP LEARNING-BASED APPROACHES
The convolutional neural network (CNN) is the most common technique for these approaches. A recent study on VOLUME 10, 2022 landslide identification [10] showed that CNN outperforms most machine learning-based approaches such as RF, LR, and SVM. Bui et al. [17] proposed a system combining CNN for classification tasks and a transformation algorithm Hue -Bi-dimensional empirical mode decomposition (H-BEMD) to locate the landslide region and size. Interestingly, this study ferreted out that the landslide size depends on time.
Another approach proposed in [18] combined CNN and a region-growing algorithm for two main tasks: detecting and classifying the landslide, which reached 97% in terms of F1 score. On other hand, time series data was also utilized to detect the structure changes [19], where non-contribute areas including vegetation, water, and buildings were removed from the pre-landslide and post-landslide images, followed up by a CNN model to detect the changes in image patches.

B. LANDSLIDE SEGMENTATION
For landslide problems, semantic segmentation is a prevalent task. Several studies employed U-net [20], which is the state-of-art deep learning model for semantic segmentation tasks, as the main method for landslide detection and segmentation. Landslide detection in the Himalayas from satellite images [21] compared the performance of U-net and common machine learning-based approaches on two popular datasets, including five optical bands from the RapidEye satellite images and ALOS-PALSAR derived topographical data. To evaluate the efficiency of the generalization model for various datasets, a survey of rainfall-induced landslides in Brazil [22] was performed on three datasets including Rapid-Eye satellite images, Normalized Vegetation Index (NDVI), and a digital elevation model (DEM) figured out that the large patch size has better perform in detect landslides in areas similar to the training area and the small patch has more efficiency in landslide detection in areas with different environmental aspects. Li et al. [23] designed a two-phase framework: F-RCNN for detection and U-net for segmentation of landslide from satellite's images, where skip connection is deployed to replace inception block at the second phase of U-net architecture. Moreover, the authors in [24] proposed another method that combines MobileNetV2 and PSPnet to accelerate the speed and reduce the number of parameters, which reduced the misclassification errors and separated the objects more precisely.

C. FEW-SHOT LEARNING
Few-Shot learning (FSL) is a special case of metalearning [25], which aims to train a model that can perform well with unseen data using a few samples under related tasks. The main idea of FSL is to determine a hypothesis space of hypotheses and estimates an optimal hypothesis. FSL included interesting variants such as one-short learning [26], [27], [28] that classifies each label with only one sample for each class and zero-shot learning [29], [30], [31], [32] that deals with unseen data using data description without any labeled samples.
FSL has been widely adopted in classification, object detection, and recognition. For example, the authors in [33] and [34] showed that FSL achieved better performance in hyperspectral images (HSI) classification, which typically requires hundreds or thousands of labeled samples. Following up, Liu et al. proposed deep few-shot learning for hyperspectral image classification [35], in which the crucial concept is to exploit the training dataset to provide a metric space that can generalize to the classes in the unseen dataset. The suggested approach achieved a higher classification accuracy than the traditional semi-supervised methods by evaluating four popular HSI data sets. Additionally, in object detection tasks, a class-imbalanced scenario was considered for road object detection using FSL [36], which demonstrated the application of few-shot learning approaches in real-world images under a driving context. Furthermore, in the recognition task, Das et al. proposed a two-stage approach based on few-shot learning for image recognition [37]. In the firsttraining stage, the authors captured the structure of the data and obtained an embedding space while also predicting the variance of each class. In the second-training stage, the proposed method learned to map the mean-sample representation to class prototype representation in the embedding space.
Few-shot segmentation [38], [39], [40] is a sub-field of few-shot learning, which utilized FSL in the segmentation tasks and was paid considerable attention in recent years. An early prototype learning for few-shot segmentation was proposed by Nanqing Dong [38]. In that article, the authors introduced a framework based on prototype learning and metric learning that significantly outperformed the baselines on PASCAL VOC 2012 dataset. Wang et al. proposed a novel prototype alignment network, termed PANet, that can effectively utilize the support set's data [39]. Interestingly, PANet introduced the prototype alignment regularization between the support and query by providing a few-shot segmentation reversely from query to support. Most recently, a method of FSL without meta-learning was proposed in [40], which adopted only transductive inference technique for a given query image while taking advantage of the statistics of its unlabeled pixels by maximizing a new loss containing three complementary terms, including the 1) cross-entropy obtained from the labeled support pixels, 2) Shannon entropy of the posteriors on the unlabeled query-image pixels, and 3) a global KL-divergence regularizer. The method achieved competitive performances in the 1-shot learning setting and noticeably improved performance in the 5-and 10-shot scenarios by 5% and 6%, respectively, in comparison with stateof-the-art episodic training approaches.

III. PRE-PROCESSING LAYER
In this section, we describe the data pre-processing step after obtaining images from UAVs/drones. A landslide is a dangerous natural phenomenon that occurs during heavy rain and floods and can cause a lot of damage to people and infrastructure. Furthermore, the landslide may block the whole road, causing traffic circulation obstruction. Due to difficulty in traffic conditions, terrain, and shortage of equipment such as UAVs and drones, it is challenging to collect data and thus the amount of collected data every year is very few. Since deep learning models typically require a huge amount of data to perform well, we need to generate more data to compensate for a limited number of collected images.
To this end, we build a newly generated dataset via three following steps, namely data crawling, data generation, and data annotation. In the following, we present the three steps for generating landslide images and then building a landslide dataset including generated images and their annotations.

A. DATA CRAWLING
We collect videos of road data recorded by UAV and drone, in several mountain and forest areas in Vietnam. Frames are extracted from videos and we obtain a dataset consisting of 767 road images. For landslide data, we collect 149 images from the internet by manually selecting landslide images that are taken from a high-ground position, which simulates the actual deployment situation when the UAVs/drones take pictures during surveillance. Then, we also annotate the images with many types of slides such as rock falls, mudslides, earth flow, and depression. The data statistics are shown in Tables 1 and 2.

B. DATA GENERATION AND AUGMENTATION
In this section, we explain the procedure to generate images containing landslides that can be utilized to train machine learning/deep learning models. Each synthetically generated image contains a road augmented with a type of landslide listed in Table 3.
To this end, we first need to determine the centerline of the road in road images and the landslide region in landslide images. For extracting the centerline, we adopt the pretrained RoadNet++ model [41] and then apply post-processing to the model output, resulting in a binary image with pixel value 1 corresponding to the centerline and 0 otherwise. For the landslide region, we crop out the region corresponding to the landslide from four types of landslide images using Labelme application. 2 Finally, we randomly insert the landslide region on the centerline of the road in the image, using the Seamless Cloning algorithm [42] method. The region is blended into the road image, making it more realistic. This above-mentioned procedure is illustrated in Fig. 2. After conducting the data generation phase, from 767 and 149 collected road and landslide images, respectively, we obtain a generated dataset consisting of 2963 images, termed the LandslidePTIT dataset.
After a landslide occurs, light rain is typically observed in the surrounding area. Besides, the landslides tend to occur in mountainous areas, thus the images are often captured in foggy weather. To simulate these two weather conditions, we further augment fog and rain to generate images in the LandslidePTIT dataset, which divides the dataset into three parts normal, fog, and rain. 3

C. DATA ANNOTATION
The images are labeled pixel-wise, i.e., each pixel of an image is annotated with a range of values from 0 to 255 that indicates the class number. For example, in Fig. 2c, id 0, 1, and 2 are background, road, and earth flow, respectively. Table 3 presents the list of classes deployed in our study. The annotations are JSON files that store coordinates of the polygon forming the landslide area.

IV. CLOUD LAYER
In this layer, we design a model to tackle the landslide detection problem. Specifically, we need to answer the following questions. First, does a landslide appear in a given input image? Second, if a landslide occurs, then does it block the road and may obstruct traffic? Third, what is the class of landslides? Intuitively, if we know the current state of the landslide and the type of landslide, e.g., rockfall, mudslide, etc., then we can issue warnings and take appropriate action based on the situation. To answer these questions, we not only solve the classification task, which classifies an image as the type of having landslide or not but also need to locate the area on the road where the landslide takes place, which is equivalent to the segmentation problem.
Besides, since collecting landslide data is challenging, it is difficult to acquire a large enough dataset to train on 2 Labelme image polygonal annotation tool: https://github.com/ wkentaro/labelme.git 3 In this study, we generate fog using FoHIS (https://github. com/noahzn/FoHIS) and generate rain using monodepth2 (https:// github.com/nianticlabs/monodepth2) VOLUME 10, 2022 conventional deep learning models. Additionally, in reality, each region has different characteristics and there are many types of landslides other than those presented in our Land-slidePTIT dataset. For example, in the northwest mountainous areas of Vietnam, the terrain is mainly high mountain forests. In this type of terrain, flash floods often occur and can be considered a new landslide type that we need to detect. Therefore, we propose a few-shot segmentation framework to tackle the landslide detection problem, in which we also newly design a cross-feature attentive squeeze network architecture that is customized for the LandslidePTIT dataset.

A. PROBLEM FORMULATION
Few-shot learning aims at performing tasks with only a few labeled data. In our study, we are given two image sets: 1) base set, denoted as D T (with base classes), and 2) novel set, D E (with novel classes). We note that the novel set also includes road and landslide images, however, the classes in the novel set are different from those in the base set. Specifically, in the training process, the LandslidePTIT dataset is divided into two parts as train set and test set, which are D T and D E respectively. We denote C T and C E as the class sets of D T and D E , respectively. In the specific case of the LandslidePTIT dataset, we have |C T | + |C E | = 6. To detect landslides under the few-shot setting, we follow the idea of episodic learning [43], which is one of the most well-known approaches in the field. Specifically, multiple few-shot tasks are created in the training process, each takes several data samples randomly drawn from the train set D T and divides the samples into two sets including a support set, denoted as S T , and a query set, denoted as Q T . The model performs the classification and segmentation of the data in the query set based on the data and labels information in the support set. For each few-shot task in the training process, the support set S T is a set that has information about the landslide and the corresponding labels while the query set Q T includes images that are not labeled, i.e., we need to classify and segment roads and landslides from images in Q T .
After training completely, we evaluate the model by using the test set D E which is also the novel set. The support set S E and the query set Q E are taken from the test set as the way performed in the training process. We have to use a few-shot learning model to predict the labels of data samples in the query set. Also, during the inference process, the query set is the set of images obtained from the UAVs/drones in the field, which do not have label data and we need to predict the type of landslide and its landslide segmentation. The support set includes several images with a new type of landslide and their labels.
In more detail, we create episodes containing two following sets of samples.
, where x s i and a s i represent a support image with its label from S * , the superscript * represents T and E for train and test sets, respectively, N is the number of classes in support set, and each class contains K labeled instances, i.e., the so-called N -way K -shot problem. In particular, x s i and a s i represent a raw image and its corresponding label for a specific category, respectively. Each value in the annotation matrix a s i is the class id of the corresponding pixel in image x s i . We also denote H , W , C as height, width, and channels of the image x s i , respectively.
, where x q j is a query landslide image and M is a number of images in Q * that needed to predict. The superscript * represents T and E for train and test sets, respectively, and j indicates the j-th data samples in Q * .
For classification task, we aim to identify the multi-hot class occurrence vector y C ∈ R N via a function f C ; and for segmentation task, we predict the segmentation mask Y S ∈ R H ×W corresponding to the classes via another function f S . The two objectives are expressed as follows: where θ C , θ S are the learnable parameter of the classification model and the segmentation model, respectively. In this study, instead of optimizing two functions f C and f S in (1) separately, we aim at jointly finding a function f CS that combines and generalizes two tasks, including few-shot classification and segmentation (FS-CS). It can predict multilabel background-aware class occurrences and also segmentation maps. The integrative FS-CS model f CS (with learnable parameter θ CS ) take as input query image x q j and support set S * , whereŷ C ∈ R N is the multi-hot class occurrence vector and Y S ∈ R H ×W is the class-wise segmentation mask. We note that FS-CS is more general than few-shot classification (FS-C) and also exhibits two major advantages over both FS-C and few-shot segmentation (FS-S) as follows: • FS-CS can classify the query images, which are belonging to none or multiple target classes (i.e., the query is classified into a background class -none if none of the target classes were detected). Therefore, in a real-life use case, if a landslide does not occur, then the system still operates properly without any warnings.
• FS-CS relaxes the assumption such that the query class set can be a subset of the support class set while the conventional FS-S [39], [44], [45] assumes the query class set exactly matches the support class set.
To solve (2), we need to extract N probability maps corresponding to each class in the support set, which is typically referred to as the class-wise foreground map set, Y, comprised of Y (n) ∈ R H ×W for N classes. Where Y (n) is the probability map of a class (each position on the map represents the probability of the position being on a foreground region of the corresponding class) and has the same size as the input image H × W . We have: where f is the model before the post-processing step and θ is the learnable parameter of the model. Y is then post-processed to extractŷ C andŶ S (see Section IV.C.1 for further details).

B. MODEL ARCHITECTURE
To solve (3), we propose a new model architecture, termed CF-ASNet, built upon the state-of-the-art ASNet model [46]. The overall procedure of CF-ASNet is presented in Fig. 3.
As illustrated in the figure, we first extract feature maps of a query image (depicted in red) and a support image (depicted in green) from a backbone network, which is illustrated by a trapezoid shape. 4 In this backbone network, three features of an image are extracted from the three last blocks, i.e., blocks 2, 3, and 4 in Fig. 3. Each feature maps pairs with the same level are then used to construct hypercorrelations -the first pyramidal correlation box in the figure. Secondly, the model then learns to transform the correlation through an attentive squeeze block whose details are presented in Fig. 4 by gradually squeezing the support dimension on each query dimension, yielding the high-level hypercorrelations that are later employed to produce the mask prediction map. Finally, in the producing process, two adjacent correlations are cross featured using a network termed the cross feature layer. Each high-level correlation tensor pair after processing results in a feature map, is upsampled and combined with the same query dimension size correlation using broadcast and squeeze layers, whose details are described in Sections IV-B3 and IV-B4, respectively. The earliest feature map is fed to a convolutional decoder, which consists of bi-linear upsampling and interleaved 2D convolution that map the number of dimensional channels to 2 (including foreground and background) and the output spatial size to the input query image size. The detailed implementations are described in the following subsections.

1) HYPERCORRELATION CONSTRUCTION
Following [45], we construct hypercorrelations between the query image and the support image. First, we extract the features of each image from a pre-trained backbone network (the pre-trained is frozen during the training process). We denote F  as follow: (4) where C is a hypercorrelation and p is denoted as a matrix position hereafter. Finally, we have a hypercorrelation pyramid:

2) ATTENTIVE SQUEEZE BLOCK
The AS blocks consist of AS layers introduced in [46]. As illustrated in Fig. 4, the AS blocks transform each correlation tensor C ∈ R H q ×W q ×H s ×W s ×C in to tensors with fixed support dimension size H s × W s , where C in and C out denote the number of channels of the input and output tensors, respectively, H s ≤ H s , and W s ≤ W s . We can consider C as a block matrix with size H q × W q , Each element of this block matrix, which is called as a support correlation tensor, corresponds to a correlation tensor between each query position p q ∈ [H q ] × [W q ] and every support position. In Fig. 4, the rearrange tensor operator expresses the transformation between the correlation tensor and block matrices. Each support correlation tensor is then fed to AS layers to analyze the global context. Finally, after rearranging, we have correlation tensors with a reduced support dimension while the query dimension is preserved, which is called high-level correlations, denoted asC.

3) CROSS FEATURE LAYER
In our study, via empirical experiments, we find that in many cases, the predicted road segmentation encroaches into the ground truth segmentation of the landslide. This observation is due to the fact that the landslide area is quite small compared to the road area and the whole image. Therefore, we propose cross feature layers (CF layers) between two adjacent correlations in a high-level correlation pyramid to enhance the model's ability to segment small objects.
In more detail, CF layers take as input two high-level correlation tensorsC ( First, we rearrange the bigger tensorC (b−1) as a matrix block of size H s × W s with each elements size H q × W q × C out . The elements then go through convolution layers for downsizing to the query dimension of the smaller tensorC (b) . Then, the result is rearranged and combined with the smaller tensor in the ratio α, which shall be empirically determined via experiments. The mixed representation is rearranged and then fed to two sequential AS layers until it becomes a point feature of size 1 × 1. The detailed architecture of CF layers is illustrated in Fig. 4b.

4) BROADCAST AND SQUEEZE LAYER
For each high-level correlationC (b) (b = 3, 4), after processing through CF layers, we have a feature map with the same size as the query dimension of the correlationC (b) . We then bi-linear upsampling that maps to the size of query dimension ofC (b−1) . Next, the resulted map and the correlationC (b−1) are input to the broadcast and squeeze layer (BS layer). The layer first uses broadcasted element-wise addition operator to combine the two inputs, then rearrange the resultant and finally feed the results to two sequential AS layers until the output becomes a point feature of size 1×1. Fig. 4c illustrates the detailed architecture of BS layers.

C. TRAINING PROCEDURE 1) PREDICTION
After obtaining the set of class-wise foreground maps, Y, we perform the prediction/inference step to archive the multi-hot class occurrence,ŷ C and the segmentation mask,Ŷ S .

2) FOR CLASSIFICATION.
With each class probability map Y (n) , if the maximum value of the matrix is greater than a pre-defined threshold δ, then it  means that the object with class n is in the query image: where p represents the position in the matrix.

3) FOR SEGMENTATION.
We compute the final segmentation maskŶ S by choosing the class that has the highest probability, for each pixel position: where Y (N +1) is the background probability map derived from the class-wise foreground maps, expressed as follows:

4) LOSS FUNCTION
FS-CS learner use segmentation loss in training, which is formulated as the average cross-entropy between the class distribution at each individual position and its ground-truth segmentation annotation: (8) where Y gt denotes the ground-truth segmentation mask.

A. BASELINES AND PERFORMANCE METRICS
We compare our proposed method with four state-of-the-art approaches: • Path Aggregation Network (PANet) [39]: PANet reaches 1st place in the COCO 2017 Challenge Instance Segmentation task and 2nd place in the Object Detection task without large-batch training.
• Prior Guided Feature Enrichment Network (PFENet) [44] is one of state-of-the-art FSL methods.
• Hypercorrelation Squeeze Networks (HSNet) [45] is a novel framework for FS-CS that analyzes complex feature correlations in a fully-convolutional manner using light-weight 4D convolutions. For multi-label classification evaluation metrics, we use the 0/1 exact ratio: ER = 1 ŷ C = y gt ], where y gt is the ground truth multi-hot class occurrence vector andŷ C is predicted vector of the model. For segmentation, we use mean IoU: mIoU = 1 N n IoU n , where IoU n denotes an IoU [47] value of n th class.

B. EXPERIMENTAL SETUP
We choose ResNet50 and ResNet101 trained on ImageNet as our backbone networks for comparison with other methods. The CF-ASNet is trained using the Adam optimizer [48] with a learning rate of 10 −4 for label segmentation. We train the model with two cases, 1-way 1-shot, and 2-way 1-shot.

C. EXPERIMENTAL RESULTS
In this section, we are interested in the following five research questions and we design experiments to investigate the answers to these questions. To validate the effectiveness of the proposed data generation method presented in Section III, we first train our model based on two datasets created by two variants of the data generation procedure as follows: • (1) LandslidePTIT dataset: We insert landslide as described in Section III.

• (2) Dataset Landslide-Normal (LN):
We insert the landslide on the designated location of the image containing roads. Then, we evaluate the performance of all methods on another real-world data from [49], termed Landslide-Premise, which consists of 400 real-world landslide images captured by UAVs. That is, Landslide-Premise dataset acts as the test data. Table 4 shows the experimental results of all few-shot models, including CF-ASNet and competitive baselines trained on the two datasets that we created.
• Performance of all models trained using LandslideP-TIT dataset are higher than that when training with LN in both classification and data segmentation tasks. Therefore, the use of blending will affect the accuracy of the data classification model.
• We illustrate the qualitative result of two datasets in Fig. 5. As shown in Fig. 5a, the landslide overwritten on the original image using (1) is blended to make the data smoother and more realistic. As for (2), the included landslide in Fig. 5b looks coarser and does not look like a normal landslide. Overall, we can see that the proposed data generation method achieves better results on actual data, showing the applicability of using this synthetic dataset as training data.

2) PERFORMANCE COMPARISON OF CF-ASNet AND OTHER BASELINES (Q2)
From this part onwards, we use the LandslidePTIT dataset for both training and evaluation as this dataset is larger and more diverse than the Landslide-Premise dataset, making it more reasonable to evaluate competitive models. In this part, we divide LandslidePTIT into train and test sets. The train set consists of 1893 images that belong to 4 classes including Rockfall, Mudslide, Earth flow, and Road, in which the     Table 5 presents the performance for both classification and segmentation tasks of all competitive schemes, including PANet, PFENet, HSNet, ASNet, and our proposed method CF-ASNet. We discuss interesting points as follows.
• We find that our CF-ASNet model achieves the best performance in terms of both 0/1 ER and mIoU among all the few-shot learning models in the 1-way 1-shot setting. Notably, the performance of CF-ASNet achieves 1% higher ER and 0.5% higher mIoU than the results obtained by ASNet, accounting for the contribution of the new cross-feature layer.
• The performance of CF-ASNet is much higher than those of PANet and HSNet, with the gain from 10 to 15% with the classification metric 0/1 ER, and from 3 to 7% with the segmentation metric mIoU. This is due to the well-designed architecture including the combination of ASNet and hypercorrelation computation.

3) COMPARISON BETWEEN ALL COMPETITIVE SCHEMES WITH RESPECT TO THE NUMBER OF WAYS (Q3)
In Fig. 6, we perform an experiment where the number of ways is adjustable. In the case when the number of ways is 1, we randomly choose a type of landslide to put in the test set. When the number of ways N > 1, i.e., N ∈ {2, 3, 4}, we randomly divide the classes of the training set to include N − 1 landslide labels, which means the test set includes 4 − (N − 1) other labels of landslide.
As shown in Fig. 6a, in the case of 1-way 1-shot, our CF-ASNet method works best with an accuracy of 81%. Besides, we also notice a decrease in both ER and mIoU as we increase the number of ways from 1 to 4. Overall, the performance of CF-ASNet is always the highest, except for VOLUME 10, 2022 the case where N is set to 3, where the performance of ASNet is slightly superior.
In Fig. 6b, we illustrate the superiority of CF-ASNet for the segmentation task when varying the number of ways.

4) HYPERPARAMETER SENSITIVITY ANALYSIS (Q4)
In Fig. 7, we present the sensitivity analysis of the parameters α of the Cross Feature Layer and threshold δ of the prediction. When analyzing α, we set the default value of δ as 0.5 in all experiments.
When we increase α from 0.1 to 0.7, the performance of both segmentation and classification tasks exhibits an upward trend. As shown in Fig. 7a, with α = 0.8, the segmentation mIoU value reaches the highest value (approximately 45). In Fig. 7b, when we set α to 0.7, the accuracy of the classification model peaked at approximately 82%. In general, we empirically find that when α ranges from 0.7 to 0.8, the model performs well. We note that we can validate the choice of parameter α using various validation methods such as k-fold validation.
Next, we adjust the threshold coefficient δ while the parameter α is set to 0.7. When δ < 0.5, the ER is high and the mIoU is at a lower value. When δ > 0.5, the mIoU experiences a decrease while ER also drops. In our experiment, we set the threshold δ = 0.5.

5) QUALITATIVE STUDY (Q5)
We perform the qualitative study by showing the results obtained from the CF-ASNet model, in the form of landslide images captured by drones in Fig. 8. The CF-ASNet model takes two sets, including query and support sets, as the input. On the left side of Fig. 8, we illustrate the query sets that are images extracted from LandslidePTIT. In the middle of the figure, we show examples of the support set used in the training process, where red and blue marks represent the pixels containing the landslide and the road, respectively. The prediction result of CF-ASNet model is displayed on the right side of the figure, side-by-side with the ground-truth images. From the example, one can see that the CF-ASNet model provides well-segmented roads and landslides.

VI. CONCLUSION
In this study, we introduced a swift detection system that can locate the occurrence of landslides on roads. Due to the difficulties in data collection of real-world landslide images, the system was trained upon a synthetic dataset, created from our new design procedure. We also proposed a new few-shot segmentation method, termed CF-ASNet, to enhance the capacity of the system in detecting landslides in real-world situations. Experimental results showed promising applicability of the generated dataset as well as the proposed CF-ASNet model in classifying and segmenting damages caused by landslides on roads.
Future avenues include the enrichment of the synthetic dataset using additional real-world conditions and terrains. We also plan to conduct an empirical study when deploying this system in the mountainous areas of Vietnam.