Re-Net: Multibranch Network With Structural Reparameterization for Landslide Detection in Optical Imagery

Deep learning techniques have been widely adopted in landslide detection by offering powerful feature extraction capabilities and automated processes. However, the pursuit of higher accuracy has led to increasingly complex network structures, which limits the efficiency of models in landslide detection. To tackle this challenge, we have developed a dynamic module, called five-branch feature extraction module (FFEM), based on the theory of structural reparameterization. This module is designed to reconstruct the encoder of the U-shaped network. Our novel network, Re-Net, effectively integrates information from multiple scales during training by utilizing its multibranch structure, which is facilitated by the FFEM. During inference, leveraging the structural reparameterization, the FFEM in Re-Net transforms as a convolutional layer, achieving an impressive 52.9% reduction in parameters, and 34.98% reduction in floating point operations. The efficiency improvement of Re-Net does not come at the expense of sacrificing landslide recognition accuracy. In the public dataset (Bijie dataset), Re-Net achieved improvements of 2.81% in intersection over union (IoU) and 1.93% in F1-Score. In postearthquake landslide detection tasks (Luding Dataset), Re-Net exhibited respective improvements of 2.29% and 1.52%. Moreover, in the task of Landslide Detection, Re-Net demonstrates superior segmentation accuracy compared to other convolutional neural networks, such as Unet++. When compared to other reparameterization modules, FFEM shows significant improvements in IoU and F1-Score in the Bijie dataset, with an average increase of 0.65% and 0.45%, respectively. Similarly, in the Luding dataset, FFEM demonstrates average improvements of 1.56% in IoU and 1.04% in F1-Score.


Re-Net: Multibranch Network With Structural Reparameterization for Landslide Detection in Optical Imagery
Rui-xuan Zhang , Wu Zhu , Zhen-hong Li , Senior Member, IEEE, Bo-chen Zhang , and Bo Chen Abstract-Deep learning techniques have been widely adopted in landslide detection by offering powerful feature extraction capabilities and automated processes.However, the pursuit of higher accuracy has led to increasingly complex network structures, which limits the efficiency of models in landslide detection.To tackle this challenge, we have developed a dynamic module, called five-branch feature extraction module (FFEM), based on the theory of structural reparameterization.This module is designed to reconstruct the encoder of the U-shaped network.Our novel network, Re-Net, effectively integrates information from multiple scales during training by utilizing its multibranch structure, which is facilitated by the FFEM.During inference, leveraging the structural reparameterization, the FFEM in Re-Net transforms as a convolutional layer, achieving an impressive 52.9% reduction in parameters, and 34.98% reduction in floating point operations.The efficiency improvement of Re-Net does not come at the expense of sacrificing landslide recognition accuracy.In the public dataset (Bijie dataset), Re-Net achieved improvements of 2.81% in intersection over union (IoU) and 1.93% in F1-Score.In postearthquake landslide detection tasks (Luding Dataset), Re-Net exhibited respective improvements of 2.29% and 1.52%.Moreover, in the task of Landslide Detection, Re-Net demonstrates superior segmentation accuracy compared to other convolutional neural networks, such as Unet++.When compared to other reparameterization modules, FFEM shows significant improvements in IoU and F1-Score in the Bijie dataset, with an average increase of 0.65% and 0.45%, respectively.Similarly, in

I. INTRODUCTION
L ANDSLIDES, characterized by the downward sliding of surface soil layers or rocks, are highly destructive geological disasters that pose significant threats to human life and socio-economic development [1], [2].Annually, thousands of people lose their lives globally due to landslide disasters, resulting in economic losses ranging from $4 to $9 billion [3].Secondary disasters triggered by landslides, such as mudslides and dammed lakes, further contribute to casualties and economic damages [4], [5].Therefore, it is crucial to provide timely and accurate landslide inventories and analyze landslide susceptibility in the affected regions.These measures are essential for resource allocation, disaster assessment, and postdisaster reconstruction [6].
In the past, landslide detection primarily relied on field geological surveys [7], which posed challenges such as heavy workload, low efficiency, and high costs.However, the rapid advancement of remote sensing technology has enabled a noncontact approach to landslide detection by combining remote sensing imagery with computer vision algorithms [8].Remote sensing platforms, including satellites and unmanned aerial vehicles, provide a wealth of imagery data, harnessing the advantages of broad coverage and high temporal resolution.This integration of technologies introduces innovative solutions for landslide detection.
The detection of landslides through remote sensing imagery involves three technological approaches: pixel-based, objectbased, and deep learning-based methods.Pixel-based methods rely on individual pixels as processing units, assigning each pixel a specific category to recognize landslides [9].However, pixel-based methods solely consider the characteristics of individual pixels, neglecting spatial relationships between pixels.In addition, they exhibit poor noise resistance and may produce discontinuities in the extracted results [10].Object-based methods are the knowledge-driven technique, which employ aggregated regions as processing units, taking into account the spectral and textural properties of the imagery, as well as the topological relationships with neighboring objects.This kind of methods can better overcome the limitations of pixel-based methods [11], [12].However, the results of object-based methods are somewhat dependent on expert knowledge for selecting segmentation scales, requiring iterative experimentation to determine appropriate segmentation criteria [13].Relying on the outstanding performance of convolutional neural networks (CNNs) in image processing [14], [15], landslide detection methods based on deep learning have been gaining recognition.These methods utilize advanced deep learning techniques such as object detection [16], [17] and semantic segmentation [18], [19] to accurately identify and segment landslides.The continuous development of CNNs models has provided a solid foundation for landslide detection and extraction, introducing more intelligent approaches to landslide detection.
In recent studies, researchers have incorporated various modules within the CNNs to enhance the feature extraction capabilities.For instance, they have employed residual modules [20], attention mechanisms [21], and the modules that can fuse multiscale features [22].Taking the application of UNet in landslide detection scenarios as an example, Dong et al. [23] introduced the multiscale feature-fusion module and residual attention mechanism to improve the model's ability to capture multiscale information and represent landslide features effectively, Chen et al. [24] employed a channel attention mechanism called squeeze-and-excitation network to recognize and extract landslides from Sentinel-2A remote sensing imagery, Chen et al. [25] introduced residual shrinkage building unit in the encoder to achieve effective recognition of active landslides in SAR imagery, Yang et al. [26] incorporated transformer and convolutional block attention module (CBAM) to strengthen the network's learning ability for landslide features.
However, the incorporation of various modules to enhance the network's ability in landslide detection also leads to increased parameter complexity.This poses challenges in applying the research findings and places higher demands on devices [27].To effectively utilize the models in real-world scenarios, it is crucial to minimize hardware requirements while ensuring accuracy.Therefore, finding a learning mechanism that can strike a balance between efficiency and accuracy has become a critical concern in the field of landslide detection.
The structural reparameterization is an essential technique in deep learning for simplifying model structures [28].Leveraging the linear characteristic of convolutional operations, structural reparameterization grants CNNs the capability of separating training and inference processes.It allows complex multibranch network structures to learn dataset features.While during inference, it compresses the complex modules into a single convolutional layer, reducing the inference cost while maintaining improved accuracy.
Based on structural reparameterization and the advantages of Inception at each stage, we propose a five-branch feature extraction module (FFEM) and develop a mathematical mechanism that allows it to be transformed into a regular convolution during the inference stage.Furthermore, we incorporate this module into the encoder of the U-shaped network, resulting in a dynamic segmentation network called Re-Net.The main contributions of this research can be summarized as follows.In Re-Net, each block of the encoder undergoes a multiscale integration process using FFEM.This integration process enhances the connectivity between different features.Moreover, due to the unique inference mechanism of FFEM, we no longer need to worry about the increase in parameters and computations resulting from merging multiscale information.Both the principle and results demonstrate that during the inference, Re-Net transforms into a regular symmetric CNNs.

A. Encoder-Decoder Architecture
Since fully convolutional network (FCN) [29], using CNNs under the encoder-decoder framework to achieve semantic segmentation tasks has become a hot trend across various industries.FCN is an end-to-end network trained pixel-by-pixel, and the input to the network undergo a transformation in a form of a hourglass shape, where the size becomes smaller first and then larger.Networks after FCN follow a general process of compressing the image to obtain features and then restoring the size to form predictions.Several segmentation networks have been developed based on FCN, such as UNet [30] and Deeplab [31].UNet introduced and popularized the symmetric encoder-decoder structure, where a U-shaped structure was designed and further developed into a series of networks.In the encoder of UNet, convolution and pooling are used alternately to achieve downsample and then the spatial features of the image are restored multiple times through the decoder.During this process, four skip connections are established to communicate shallow and deep information.This study draws on the design philosophy of the encoder-decoder structure and proposes a symmetric semantic segmentation network, Re-Net.

B. Multibranch Structure
The concept of parallel multibranch structures has been validated since the introduction of Inception [32].In Inception v1, the input information undergoes multiscale convolution and pooling synchronously, and features are extracted from multiple perspectives.Inception v2 [33] focused on the problem of overfitting and proposed a classic convolutional block architecture (convolution-batch normalization-ReLU [34]) that effectively avoids overfitting.Inception v3 [35] further explores the problem of convolution decomposition and advocates for the feasibility

C. Structural Reparameterization
The introduction of RepVGG [37] has brought significant attention to the structural reparameterization technique, which enables the separation of network training and inference processes.In RepVGG, a multibranch structure is designed to enhance the network's ability to capture features.During the inference stage, each branch (a combination of batch normalization or convolution-batch normalization) is converted to convolution and then merged into a single path.Before and after the introduction of RepVGG, scholars have designed classic structural reparameterization networks (or modules) such as asymmetric convolutional network (ACNet) [38] and diverse branch block (DBB) [39].This study draws on the idea of structural reparameterization and uses multiple reparameterization methods based on the linear operation characteristics of convolution during inference.Furthermore, we propose a reparameterization method that combines 1 × 1 convolution and asymmetric convolution in series.

A. Linear Characteristics of Convolution
In the convolution (abbreviated as "conv" when describing the structure later, 1 × 1conv refers to a convolution with 1 × 1 kernel size), the input M ∈ R U ×V ×C (U and V represent the length and width of the input, C represents the number of input channels) is convolved with a kernel F ∈ R H×W ×C×D (H and W represent the length and width of the kernel, D represents the number of output channels), and a bias matrix B ∈ R R×T ×D (R and T represent the length and width of the output, B z represents the bias for the zth convolution, and B (:,:,z) is a constant matrix everywhere equal to B z ) to produce an output O ∈ R R×T ×D (see Fig. 1).
The value of the zth channel of the output O can be defined as follows: meanwhile, the value of O's zth channel at (x, y) can be further defined based on the principle of convolution calculation as follows: From ( 1) and ( 2), it can be inferred that convolution only involves linear operations throughout the entire computation process, thus making it a linear operation.For convolutions with the same structure, they possess distributivity and additivity properties Substituting Substituting

B. Five-Branch Feature Extraction Module
Based on the design principles of the Inception architecture, we proposes the FFEM, which consists of five parallel branches of operations [see Fig. 2(a)].The 1 × 1conv-bn (where bn stands for batch normalization) branch not only enriches the representation of features but also accelerates the convergence of the model.Two asymmetric convolutional branches (1 × 1convbn-1 × 3conv-bn and 1 × 1conv-bn-3 × 1conv-bn) are used to capture the local relationships of feature maps in the width and length directions.The 1 × 1conv-bn-3 × 3conv-bn branch processes features in the channel dimension, achieving feature Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 2. Structure of FFEM and reparameterization process ("a" represents the structure of FFEM during training.After the network completes training, FFEM will be reparameterized to the state of FFEM inference stage through the structure shown in "b," as shown in "c," the structure consists of a 3 × 3conv."d" demonstrates the process of 1 × 1conv and asymmetric convolution pad to 3 × 3conv).fusion and interaction.The 3 × 3conv-bn branch serves as the baseline branch of FFEM and receives additional weight and bias values from the other branches.
Based on the linear property of convolution, we can convert FFEM to a 3 × 3conv during the inference stage [see Fig. 2(b)].It is worth noting that we do not add bias to the convolutions during training, but instead use the merged residual terms as bias during the inference stage.The conversion mechanism can be described as follows: First, the output O of any conv-bn operation in the branch for the zth channel can be expressed as follows: (5) where σ 2 z , μ z , γ z , β z , and ε z represent the variance normalization value, mean normalization value, scaling factor, shifting factor, and a small constant to prevent division by zero errors for the zth channel.Based on the linear property of convolutional operations, the equation can be further simplified to (6), then the conv-bn will be converted to a conv, and any bn contained in each branch will be merged into the previous convolution connected to it.Subsequently, for branches containing concatenated convolutions, when the 1 × 1conv is concatenated with another convolution operation, let the 1 × 1conv be denoted as F (1) ∈ R 1×1×P ×P and the second conv be denoted as F (2) ∈ R K 1 ×K 2 ×P ×L (where P represents the input and output channels of 1 × 1conv, K 1 and K 2 represent the length and width of the convolutional kernel, and L represents the output channels of the second conv).After each convolution is connected to bn, the value of the zth channel of the output O after passing through the branch can be defined as follows: β (1)  z −μ (1)   z * F (2) (:,:,k,z) where variables related to 1 × 1conv-bn will be denoted with a superscript of (1), and variables related to K 1 × K 2 conv-bn will be denoted with a superscript of (2).At this point, the output value consists of three parts: Due to the fact that the 1 × 1conv only performs linear operations on the channel dimension in (:,:,k,z) , the operation between the two convolutions is equivalent to convolving F (2) using F (1) .Regarding z ), and the result along with (−μ z ), serves as the bias term for the convolutions after inference.Substituting F (:,:,k,z) ← (:,:,:,z) ) in ( 7), and each branch only contains one conv (1 × 1 conv, 3 × 3 conv, 1 × 3 conv, 3 × 1 conv).For 1 × 1 conv and asymmetric convolutions (1 × 3 conv, 3 × 1 conv), we can expand them to 3 × 3 conv by pad with zeros [see Fig. 2(d)].Now, each branch of the module only contains one 3 × 3 conv, which can be further merged into one convolution operation based on the principle of linearity.

C. Re-Net Network
Re-Net is a type of CNNs that comes in two forms and utilizes a U-shaped framework for landslide segmentation in remote sensing imagery.The architecture consists of a symmetric encoder-decoder structure.The encoder consists of five convolutional modules (Conv Blocks) and Maxpoolings stacked together to perform downsample.Conv Block includes two FFEMs (each followed by a ReLU activation function) during training.The decoder consists of upsample blocks corresponding to the encoder hierarchy and a 3 × 3 conv used to map the number of classes.When the feature map passes through the upsample block, it first goes through an upsample layer (Upsample2 × −3 × 3 conv-bn-ReLU), where the feature maps are restored while the number of channels is reduced.At this point, the feature maps are fused with the corresponding feature maps from the encoder using cat, and the feature maps remains unchanged while the number of channels is doubled.Then, through a feature fusion module consisting of two convolutional layers (3 × 3 conv-bn-ReLU), the number of channels in the feature maps are restored to the number before cat (see Fig. 3).
During training, Re-Net continuously densifies the information about landslides through Conv Blocks.FFEM observes the features of landslides on various dimensions through multiple branches and maps them to more dimensions.After undergoing multiple downsample operations, the feature of the data will be transmitted to the decoder of Re-Net.The conv in the upsample block will assists Re-Net in expanding the size of the feature maps and restoring its number of channels in a supervised way.In the inference, all FFEMs in the encoder will be transformed into 3 × 3 conv based on the aforementioned reparameterization principle to compress the required number of parameters for inference.

IV. EXPERIMENTS
A. Datasets 1) Bijie Dataset: In the study, we utilized the Bijie dataset provided by Ji et al. [40] as the public dataset.This dataset consists of TripleSat satellite images captured between May and August 2018 and includes annotations for 770 landslides using a human-machine interaction approach (see Fig. 4).We resized all images to a uniform size of 224 pixels, and randomly divided the dataset into train, validation, and test sets in a ratio of 7:2:1.The train set was used for network weight training, the validation set was used to assess the effectiveness of the training process and determine if weight updates were necessary, and the test set was employed to evaluate the network's performance in real-world scenarios.
2) Luding Dataset: In western China, earthquake-induced landslides pose a significant geological hazard.Studies have demonstrated that in mountainous regions, earthquakes with a magnitude exceeding 4 often lead to a high likelihood of landslide occurrences [41].To assess the performance of Re-Net in diverse scenarios, we selected a noteworthy postearthquake landslide event that took place in Luding County, Sichuan Province, China on September 5, 2022.The epicenter of this earthquake was situated in Moxi Town (29.59°N, 102.08°E), with the Moxi Fault identified as the seismogenic fault [42].The earthquake triggered numerous landslides of varying magnitudes in the surrounding area, resulting in significant impacts on human lives and property safety [43].
In the study, we utilized GF-2 satellite data (orbit number 43 565) collected on September 10, 2022.The GF-2 data consist of both multispectral imagery and panchromatic imagery.To ensure data accuracy, we conducted essential correction processes such as radiometric calibration and orthorectification.Furthermore, we employed nearest neighbor diffusion pan sharpening technique to fuse the multispectral imagery and panchromatic Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.imagery.This fusion process resulted in high-resolution multispectral images with enhanced spatial resolution and vibrant color information [44].Next, based on visual interpretation of the images, we identified and labeled 283 landslide areas within the study region.To maintain consistency, we uniformly scaled these labeled areas to a size of 224 pixels.Subsequently, we divided the dataset into train, validation, and test sets following a ratio of 7:2:1 for conducting relevant experiments (see Fig. 5).

B. Evaluation Criteria
In the study, we conducted a comprehensive evaluation of our model, focusing on two key aspects: model efficiency and the accuracy of landslide extraction.To assess the model's efficiency, we employed parameters (Params) and FLOPs as the measures of model redundancy.For evaluating the completeness of landslide extraction, we utilized two metrics derived from the confusion matrix, intersection over union (IoU) and F1-Score IoU = TP TP + FP + FN (11) where TP represents the number of pixels correctly identified as landslides; TN represents the number of pixels correctly identified as background; FP represents the number of pixels extracted as landslide but labeled as background; and FN represents the number of pixels extracted as background but labelled as landslide.

C. Training Details
In the study, we established the necessary environment using the PyTorch framework.To enhance computational efficiency, we utilized a GPU (NVIDIA GeForce RTX 3090) for accelerated processing.During the training, we adopted stochastic gradient descent [45] as our optimizer, with a momentum of 0.9 and a weight decay of 0.005.The loss function employed was negative log-likelihood loss, where the output was transformed into logarithmic probabilities using the log softmax function before calculating the loss.To control the learning process, we set the initial learning rate to 0.01 and employed Cosine Annealing [46] as the learning rate decay strategy.The batch size was set to 4, and we conducted a total of 50 epochs for training.During Re-Net training, we continued the design principle of DBB's concatenated convolutions, and used pad to supplement the surrounding area after the bn following the 1 × 1 conv (for the 3 × 3 conv branch, we supplemented all four sides, while for the asymmetric convolutions, we only supplemented the top and bottom or left and right sides with the corresponding channel's bias values).

A. Landslide Detection From Bijie Dataset
Re-Net exhibited impressive performance on the Bijie dataset, achieving IoU of 72.23% and F1-Score of 83.88%.To comprehensively evaluate the model's performance, we utilized both gradient-weighted class activation mapping (Grad-CAM) [47] and statistical values derived from the confusion matrix (FN, FP, TN, TP).As depicted in Fig. 6, Grad-CAM visually showcased the ability of the trained Re-Net to focus attention on landslide areas during inference.When the boundaries of landslides were clear and there was a noticeable contrast with the surrounding terrain (III and IV in Fig. 6), Re-Net effectively segmented the landslides.However, upon analyzing the statistical values for quantitative analysis, we identified some challenges in the extraction process.First, external factors such as image sampling or unique geological structures of the area (as indicated by the yellow box in II of Fig. 6) led to a reduction in the distinctive characteristics of landslides and the surrounding terrain, impacting Re-Net's ability to recognize and delineate boundaries, resulting in an increase in FN.Second, Re-Net struggled to accurately classify areas where the landslide "wrapped" around other terrain, causing a color shift within the landslide.For instance, in the upper left yellow box in I of Fig. 6, where the landslide included buildings and roads, experts classified it as a landslide area, but Re-Net misclassified it as a nonlandslide area.Similarly, in the lower right yellow box, the landslide was located on the periphery of a vegetation area without damaging the vegetation.Experts determined it to be a nonlandslide area, whereas Re-Net classified it as a landslide area.

B. Landslide Detection From Luding Dataset
We utilized Re-Net for segmenting landslide areas in the well-calibrated Luding dataset, and the experimental results are presented in Fig. 7. Re-Net an impressive IoU of 74.79% and F1-Score of 85.58% in the task.The dataset's high resolution and clear depiction of landslide morphology made the detection process relatively easier compared to the Bijie dataset.By observing the Grad-CAM visualizations, it is evident that Re-Net accurately focused on the main body of the landslides.However, we also encountered some challenges in the results.First, when the landslide was obstructed by terrain or other factors causing color variations within the landslide, Re-Net's attention decreased accordingly, as indicated by the red boxes in Fig. 7(a) and (b).Although the network could still recognize these areas with reduced attention, the boundary definition of such regions was not precise.Second, as depicted by the red boxes in Fig. 7(c)-(f), Re-Net faced difficulties in accurately delineating the landslide boundaries without relying on external auxiliary data like (digital elevation model (DEM), slope, and terrain undulation.Therefore, we recommend incorporating

A. Ablation
We conducted the ablation experiment to compare the impact of using FFEM on Re-Net's inference of landslide features before and after.In the control group, all FFEMs in Re-Net were replaced with 3 × 3 conv-bn (without involving merging bn into conv, i.e., the basic U-shaped network structure).The experimental results, presented in Table I, demonstrate that FFEM brings a significant and cost-free improvement to the network's performance.In the Bijie dataset, the IoU and F1-Score increased by 2.81% and 1.93%, respectively.Similarly, in the Luding dataset, the IoU and F1-Score increased by 2.29% and 1.52%, respectively.These results highlight the crucial role of FFEM in enhancing Re-Net's ability to accurately infer landslide features.
The introduction of FFEM to Re-Net has provided an opportunity to compress the parameters.During the training, Re-Net initially has a parameter count of 73.36M and FLOPs of 154.08G, which is significantly higher than UNet's parameter count of 34.54M and FLOPs of 100.20G.However, during the inference stage, the structure of Re-Net undergoes dynamic changes, in which FFEM is simplified to a regular convolution.As a result, the reparameterized Re-Net experiences a reduction of 52.9% in parameters and 34.98% in FLOPs, resulting in a parameter count of 34.52M and FLOPs of 100.18G, which is nearly equivalent to that of UNet (even slightly smaller, considering the merging of batch normalization after reparameterization).

B. Comparison With Other CNNs
The study compares Re-Net with FCN and DeepLabV3+ [48].In addition, considering that Re-Net is based on the U-Net network design, we also compared it with two excellent variants of U-Net.Attention U-Net(Att-UNet) [49] enhances the local regions of interest by introducing attention gates, while UNet++ [50] is a nested U-Net architecture that weakens the semantic gap between network layers through a series of nested dense skip connections.It is worth noting that we used VGG as the encoder for FCN, while Xception [51] was chosen for DeepLabV3+.
The segmentation accuracy of various networks is presented in Table II.In the Bijie dataset, Re-Net outperforms FCN and DeepLabV3+, achieving improvements of 4.29% and 3.61% in IoU, and 2.97% and 2.49% in F1-Score, respectively.When compared to Att-UNet and UNet++, Re-Net also shows improvements of 0.33% and 1.23% in IoU, and 0.23% and 0.84% in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.respectively.These performance trends are similarly observed on the Luding dataset.It is worth noting that while Att-UNet and UNet++ show enhancements over U-Net on the Bijie dataset, they do not achieve further improvements on the Luding dataset.In contrast, Re-Net consistently demonstrates performance improvement on both datasets, showcasing its effectiveness as proposed in this study.
Table III presents the parameter details of various CNNs, where Params I and Params II represent the model parameters before and after reparameterization, respectively, and FLOPs I and FLOPs II represent the model FLOPs before and after reparameterization, respectively.Among them, FCN, being an early and relatively simple network, has the fewest parameters.DeepLabV3+, utilizing ASPP [52] for multiscale feature fusion, exhibits a significant increase in parameters.Indeed, the FLOPs of the networks in these two frameworks are significantly lower than those of the UNet framework.
Regarding the two improved UNet models, Att-UNet and UNet++, they have an increase in parameters ranging from 0.35 to 2.10 M and 1.67 to 111.07 FLOPs(G) compared to UNet (UNet++ seems overly complex).Re-Net, with its multibranch structure, has a higher parameters and FLOPs.However, during inference, the network structure of Re-Net can compress the overall network parameters to match the size of a regular UNet, guided by the theory of structural reparameterization.This property of Re-Net makes it particularly advantageous in emergency scenarios such as landslides.

C. Comparison With Other Reparameterization Modules
To further demonstrate the benefits brought by the structural reparameterization mechanism in FFEM, we conducted a comparative analysis with two other reparameterization modules: Asymmetric convolution block (ACB) and DBB.ACB is a classic structural reparameterization module in ACNet, comprising three branches (3 × 3 conv-bn, 1 × 3 conv-bn, and 3 × 1 conv-bn) that are merged during inference.DBB is a more intricate reparameterization module featuring four branches (1 × 1 conv-bn, 1 × 1 conv-bn-k × kconv-bn, 1 × 1 conv-bn-Avgpooling-bn, and k × kconv-bn).For consistency in our experiment, we set the kernel size (k) in DBB to 3. We compared the utilization of ACB and DBB as replacements for FFEM in Re-Net.The accuracy performance of the three models across different datasets is presented in Table IV.Overall, all three reparameterization modules contribute to the improvement of UNet.However, the results indicate that FFEM exhibits a more significant enhancement effect.Specifically, in the Luding dataset, FFEM outperforms the other two modules by increasing the IoU and F1-score by an average of 1.56% and 1.04%, respectively.Nonetheless, this improvement is less pronounced in the Bijie dataset, where the average increase in the two indicators is only 0.65% and 0.45%, respectively.We attribute this phenomenon to the complexity of the datasets.The Luding dataset comprises images captured within one week after an earthquake, with all selected landslides being bare landslides.In contrast, the Bijie dataset may include both potential and already covered landslide areas, making landslide detection more challenging.As the dataset or training task becomes more complex, the training benefits offered by reparameterization modules tend to diminish.Therefore, we recommend that in more intricate landslide detection tasks, simpler reparameterization modules (such as ACB) can be chosen as an additional boost to enhance detection performance, rather than relying solely on them.
We conducted an analysis of the compression effects of the three structured reparameterizations on the number of parameters (Params) and computational complexity (FLOPs).The results are summarized in Table V, where Params I and Params II represent the model parameters before and after reparameterization, respectively, and FLOPs I and FLOPs II represent the model FLOPs before and after reparameterization, respectively.Since the three modules share the same structure during the inference stage, their Params II and FLOPs II values are equal.Re-Net achieved a parameter reduction of 52.9% before and after inference, ACB-Unet saved 26.76%, and DBB-Unet saved 41.87%.In terms of FLOPs, Re-Net achieved a reduction of 34.98%, ACB-Unet saved 14.83%, and DBB-Unet saved 25.60%.Therefore, while ensuring segmentation accuracy, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Re-Net has the highest compression rate in terms of both parameter and computational efficiency.

VII. CONCLUSION
In the study, we present a dynamic multibranch structured module called FFEM, which utilizes structured reparameterization techniques.FFEM is integrated into a dynamic encoderdecoder semantic segmentation network named Re-Net, specifically designed for landslide detection tasks in emergency scenarios.Our research findings demonstrate that Re-Net effectively addresses the challenge of complex structures, resulting in improved network inference efficiency without compromising training accuracy.However, we acknowledge two limitations in our study.First, we recognize that Re-Net may have limitations in effectively capturing boundary information during landslide detection.To overcome this, future research will explore incorporating boundary constraints into the training process or leveraging remote sensing technology.This will enhance the overall accuracy by incorporating additional data such as DEM, slope, aspect, and terrain relief, thereby enriching the data reconstruction capabilities.Furthermore, it is crucial to expand the testing scenarios for the model, as demonstrated in papers [53] and [54].By evaluating the model's ability to recognize landslides of different scales and origins, and analyzing the differences therein, we can gain valuable insights for further improving the model.This expansion of testing scenarios will contribute to the meaningfulness of our future research.

Fig. 4 .
Fig. 4. Image (left) and DEM (right) data of the study area (the Bijie city); the red points are the locations of identified landslides(from the reference [40]).

Fig. 6 .
Fig. 6. Results of Re-Net on the Bijie test (I-IV depict four examples from the test set, with yellow boxes indicating regions where FP and FN occurred.(a) represents the input image, (b) represents the ground truth label, (c) represents the segmentation result obtained from Re-Net, (d) represents the Grad-CAM visualization corresponding to Re-Net's output layer, where areas with a redder color indicate higher network attention.In addition, (e) represents the visualization of the statistical values derived from the confusion matrix, where TP are labeled in green, FN in red, TN in black, and FP in blue).

Fig. 7 .
Fig. 7.The Results of Re-Net on The Luding Test (a)-(f) represent examples from the Luding test set, From top to bottom, the presentation includes landslide cases from the Luding Test dataset, along with Grad-CAM visualizations and extracted results generated by Re-Net on these cases).

TABLE I ACCURACY
COMPARISON OF ABLATION EXPERIMENT external auxiliary data into the network training process when combining semantic segmentation technology with landslide detection tasks.