Real-Time Damaged Building Region Detection Based on Improved YOLOv5s and Embedded System From UAV Images

Detecting the damaged building regions is vital to humanitarian assistance and disaster recovery after a disaster. Deep-learning techniques based on aerial and unmanned aerial vehicle (UAV) images have been extensively applied to the literature to detect damaged building regions, which are approved to be effective methods for fast response actions and rescue work. However, most of the existing building damaged region detection methods only consider the extraction accuracy of damaged regions from aerial or UAV images, which are not real time and can hardly meet the practical application of emergency response. To address this problem, a new real-time building damaged region detection based on improved YOLOv5 and adapted to an embedded system from UAV images is proposed, which is named DB-YOLOv5. First, residual dilated convolution module is employed to extract the spatial features, which can increase the receptive field. Then, a feature fusion module (BDSCAM) is designed to enhance the expressive ability of object feature, which could improve the classification performance of detector. Finally, a double-head method, an integration system of fully connected and convolution head for bounding box regression and classification, executes the localization task. The proposed DB-YOLOv5 method was evaluated using postdisaster UAV images collected over Ludian, China, in 2013 and Beichuan, China, in 2008. We found that the experimental results demonstrate that the proposed method is high accuracy and efficient for building damaged region detection and assessment on the embedded system. This approach is robust and suitable for practical application in disaster scenarios.

slow, dangerous, and difficult task since various communication facilities were severely damaged after the disaster. Nowadays, diverse remote sensing data from different sensors, such as optical images, synthetic aperture radar (SAR), and Lidar, have been widely utilized to achieve building damage assessment due to their safe and efficient way. Specifically, unmanned aerial vehicle (UAV) images have been preferred to detect and access the building damage since UAV fights can pay attention on damaged areas of interest in a much more controlled way. Moreover, they can be used to obtain more detailed damaged information of building compared with manned platforms because of their higher portability and higher resolution [1]. Therefore, damaged building region detection using UAV images can quickly and accurately provide disaster information for decision making and disaster management.
Over the past decade, many researches have attempted to use remote sensing images to detect the building damaged region. These methods can be divided into three main categories: visual interpretation, automated change detection from multitemporal images, and single-temporal images with auxiliary data. The visual interpretation method [2], [3], [4] utilizes GIS data and various remote sensing images to detect the building damaged information but depends on many different types of auxiliary tools, such as ArcGIS. This method is the most widely used method for building damage detection since it has currently the highest accurate. Unfortunately, it requires trained operators and is time-consuming, which cannot meet the requirement of rapid damage assessment. The change detection method [5], [6], [7], [8], [9] is to obtain the damaged information of building by comparing pre-and postdisaster certain distinguishing feature of images, which includes the shape and texture feature change from optical images, correlation coefficients and coherence from SAR images, and the height feature change from stereo images and Lidar. It can lead to more accurate and reliable results. However, mutitemporal methods are limited in many regions since lots of developing countries do not have predisaster images. Furthermore, the registration between pre-and postdisaster is to also face the challenge. The single-temporal images approach is another way to detect the damaged building region based on only postevent remote sensing data [10], [11], [12], [13], [14], [15], [16]. This approach mainly uses various features, such as shape, spectra, texture, and shadow, to detect the damaged building area with auxiliary pre-earthquake GIS vector data. They are more beneficial for rapid damaged building region detection during the fast response. Furthermore, object detection methods based on deep learning have revolutionized in computer vision tasks. Many object detection methods, such as faster R-CNN [17], YOLOv3 [18], YOLOv4 [19], and YOLOv5 [20], were successfully employed to detect the building damaged regions from postdisaster images. These end-to-end object detection models are more attractive than the traditional methods due to its high detection accuracy, fast detection speed, and lightweight model. This approach has been also widely employed to detect the damaged building area in terms of speed, automation, and accuracy [21]. Therefore, end-to-end object detection methods are better suited to the needs of emergency response applications.
The approaches of building damage region detection based on object detection from postdisaster images have obtained satisfactory accuracy and efficiency, but most of the works have been paid to detect from the images collected by UAV on highperformance servers. The main reason is that these methods have high computation cost and large size of the recognition model, which is not suitable to deploy the embedded devices for real-time detection, but UAV cannot carry high-performance computers due to the limited power consumption. Therefore, the existing methods cannot meet the actual needs of damaged building regions detection. In this article, motivated by the realistic observations of building damaged detection problem, we try to detect the building damaged region when UAV is flying, which results in faster detection efficiency. A lightweight building damaged region detection model based on improved YOLOv5s for embedded device was proposed, which can realize the real-time automatic detection of damaged building region on UAV. The proposed method can be more suitable for actual damage detection needs.
The main contributions of our work are summarized as follows.
1) A new real-time and accurate detection method is proposed based on YOLOv5 and embedded system, namely DB-YOLOv5, to detect the damaged building regions. We present DB-YOLOv5, which can real-time detect the damaged building regions on UAV, instead of detecting after collecting the images. This will better meet the postdisaster emergency response requirements. 2) The proposed DB-YOLOv5 is a lightweight model of damaged building areas that can consider detection accuracy and model complexity under the environment of embedded system with limited computation resources and memory. To address these challenges, we improved the original YOLOv5 architecture in three ways: the network structure, feature fusion, and head of detection. The network structure of DB-YOLOv5 uses a hybrid residual dilated convolution (Res-DConv) model and removes the focus module of original YOLOv5, which implements a lightweight network model for deployment to the embedded devices. The feature fusion combines spatial and channel attention based on the attention mechanism, which solves the different scale problems of images during the UAV flying processing. The head of detection employs the fc-head(fully connected head) for classification and the conv-head(convolution head) for bounding box regression, which improves the accuracy of classification and location. 3) We use two remote sensing datasets to assess our proposed methods under the environment of embedded system, revealing that the proposed method outperforms state-ofthe-art results quantitatively, qualitatively, and efficiently on postdisaster remote sensing images. The rest of this article is organized as follows. Section II give an overview of the related work in the field. Section III describes the datasets. Section IV presents our proposed method to detect the building damaged regions from UAV images. Section V shows the proposed method's extensive experiments and results for detecting damaged building regions. Finally, Section VI concludes this article.

A. Object Detection
Object detection is a popular computer vision task that involves detecting and localizing objects within an image or video. This involves not only recognizing the presence of objects but also determining their precise location within the image. The methods of object detection based on convolutional neural network (CNN) include the following two types.
1) Two-stage objects detector consists of a region proposal network (RPN) and a classification network, such as RCNN, SPPNet, and faster RCNN. In the first stage, the RPN generates a set of candidate object regions, also known as region proposals. These regions are proposed based on features extracted from the image using CNNs. The RPN is responsible for identifying regions in the image that are likely to contain an object, regardless of its class. In the second stage, the classification network takes each of the region proposals generated by the RPN and classifies them into one of the predefined object categories. The classification network also uses CNNs to extract features from the region proposals, which are then passed through one or more fully connected layers to generate a class score for each proposal. Two-stage object detectors have been shown to be highly accurate in detecting objects, particularly in complex scenes where there are many objects of different sizes and orientations. However, they can be computationally expensive due to the need to generate a large number of region proposals in the first stage. 2) One-stage objects detector does not require a separate proposal generation step. In contrast to two-stage object detectors, which typically use an RPN to identify potential object locations before classifying and refining them, one-stage detectors operate on the entire image at once and use anchor boxes to identify object locations. Some popular one-stage object detectors include You Only Look Once (YOLO), single-shot detector, and RetinaNet. One-stage detectors are generally faster and simpler than two-stage detectors, making them well-suited for real-time applications, such as autonomous driving, robotics, and surveillance. However, they can be less accurate than two-stage detectors, particularly when dealing with small objects or scenes with a high degree of clutter.

B. Building Damaged Region Detection
With the rapid development of object detection technology, object detection in optical remote sensing imagery, namely geospatial object detection, has attract much attention in recent decades. Geospatial object detection based on CNN is used to detect and locate the ground objects from remote sensing image, such as buildings, roads, vegetation, and vehicles. It has many practical applications, such as urban planning, natural resource management, environmental monitoring, and disaster response. Typical geospatial object detection methods include ORSIm detector [22], UIU-Net [23], and Fourier-based rotation-invariant feature boosting [24]. ORSIm detector uses an object-based approach that takes the advantage of both spectral and spatial characteristics of the image data. It segments the image into regions of interest, or objects, and then extracts features from these objects to classify them into different categories. It has been shown to produce accurate and reliable results in a variety of applications, including detecting land cover changes, mapping urban areas, and identifying flooded areas in disaster response scenarios. UIU-Net is designed to address the challenges of object detection in aerial imagery, including the variability of object sizes and shapes, as well as variations in lighting and perspective. The framework consists of an unsupervised feature learning module, followed by a supervised object detection module.
The unsupervised feature learning module learns features directly from the raw image data using a convolutional autoencoder. The learned features are then used to train a supervised object detection module based on the U-Net architecture. UIU-Net has been shown to achieve state-of-the-art performance on several benchmark datasets for aerial object detection, including the Vaihingen and Potsdam datasets. The framework is also highly interpretable, meaning that it is possible to visualize the learned features and understand how the model is making its predictions. This can be particularly useful for applications, such as urban planning and environmental monitoring, where understanding the reasoning behind the model's predictions is important. Fourier-based rotation-invariant feature boosting is based on the use of the Fourier transform to convert the input image into the frequency domain, where the rotation of the image is represented by a phase shift in the Fourier spectrum. By analyzing the Fourier spectrum, the method identifies a set of rotation-invariant features that are stable across different orientations of the object. The output of this algorithm is a set of classification scores for each object in the image, which can be used to identify the location and orientation of the objects. Because the method is based on rotation-invariant features, it is less sensitive to changes in orientation than other feature extraction methods, making it particularly useful for applications where the objects may be oriented in different directions.
The end-to-end object detection methods, such as faster R-CNN [18], YOLOv3 [19], YOLOv4 [20], and YOLOv5 [21], can directly detect the building damaged region in the postdisaster images without the predisaster image as a reference, which is better suited to the needs of emergency response applications. Therefore, these methods are more attractive than the traditional methods in terms of speed, automation, and accuracy. However, these methods have been paid to detect the building damaged region from the postdisaster images collected by UAV. This means that the postdisaster images are first collected by UAV and then are detected the building damaged area in the actual rescue and disaster relief application; the detection efficiency, in this way, is low. Hence, a new way of detecting the building damaged region is proposed. We try to detect the building damaged region when UAV is flying, which results in a faster detection efficiency. However, to achieve this working mode, the building damaged region detection method not only ensure the detection accuracy but also has a lightweight model and low computational complexity, which can be deployed in embedded system. In this way, UAV can carry embedded devices for on-the-fly detection. This is the motivation of our article.

III. STUDY AREA AND DATA SOURCES
To verify the effectiveness of the proposed method, around 3738 postdisaster images collected by UAVs from during events in Sichuan and Yunnan provinces in China were obtained. The image size was 6000 × 4000 pixels for Sichuan and 5365 × 3936 pixels for Yunnan. These images were used to train a deeplearning model for detecting the damaged building area and detect the damaged building regions in real time. Table I presents the postearthquake images of the two study areas collected during two different earthquake events.
The 2014 Ludian earthquake occurred in the far western Chinese region of Yunnan province with a moment magnitude of 6.1 at 16:03:01 China Standard Time (CST) on August 3. The Wenchuan earthquake struck Sichuan Province of China at 14:28:01 CST on May 12, 2008 at a magnitude of 7.9, Wenchuan earthquake experienced tremendous damage. While Beichuan city was rebuilt, some earthquake ruins were left for research purposes. The ruin area is 2867.83 km 2 . In April 2014, we investigated and measured the Beichuan earthquake ruins, obtaining disaster UAV images on April 10, 2014. The range of flying height was from 300 to 500 m, and the images resolution with a three-channel RGB was about 10-15 cm.
In order to prepare the training samples for DB-YOLOv5, there were 1230 damaged building damaged regions selected from 500 images samples, and each of them includes at least one building damaged region. The preprocessing process includes the following steps: data enhancement, the annotation of damaged building regions, and classification. The input image size of the original YOLOv5 architecture is 640 × 640 pixels. However, we employed the smaller input size since we consider the training efficiency and the comparison with YOLOv3 and YOLOv4 for which the original input size of YOLOv3 and YOLOv4 is 416 × 416 pixels. Therefore, the size of subimages cropped from UAV images is 416 × 416 pixels. Each subimages contain at least one damaged building area. They were manually and randomly selected as the samples. An open-source tool called LabelImg was used to annotate damaged building area with a vertical bounding box. As shown in Fig. 1, the green box is used to annotate the damaged building area annotated with LabelImg, and the annotated result is saved as XML files.

IV. METHODS
Rapid and accurate building damage assessment is critical for humanitarian assistance and disaster response when disasters occur suddenly. The key of building damaged region detection is to find a suitable lightweight detector that can balance detection accuracy and model complexity under the constraints of embedded system with limited memory and computation resources. This section details the main idea of DB-YOLOv5.

A. Network Structure of YOLOv5
Yolov5 is a fast, high-performance single-stage objection detection method with a straightforward and flexible structure that can be broken down, adjusted, and extended. The main network structure of YOLOv5 included four different scales, YOLOv5s for small, YOLOv5m for medium, YOLOv5l for large, and YOLOv5x for xlarge. YOLOv5s is fast and has a lightweight network structure, which is suitable for deployment on the embedded devices that could implement real-time detection. The architecture of YOLOv5 is shown in Fig. 2.
As we all know, YOLOv5 network includes three parts: backbone, neck, and output. The backbone network utilizes multiple convolution and pooling to extracts various feature maps from the input image. Different from other network model, this backbone integrates gradient change information into feature maps. This reduced the inference speed and improved the precision, and the model size becomes smaller by reducing the parameters. The neck network uses path aggregation network (PANet) to fuse the different level feature maps, which can decrease the information loss and acquire more contextual information. The output network generates three different outputs of feature maps to achieve multiscale detection, which is also of benefit to enhance efficiently the prediction of small to large objects in the model. The key idea of YOLOv5 is as follows. At first, CSPDarknet53 is used to extract the feature of image, and then PANet is used to fuse the feature of image. Finally, the output layer of YOLOv5 generates the results.

B. BD-YOLOv5
The YOLOv5s model has the advantages of a small size and fast speed, but its detection accuracy is lower than the Yolov4 and EfficientDet [25]. Therefore, the design of building damaged region detection network was based on the improvement of YOLOv5s [26] architecture. As shown in Fig. 3, the original YOLOv5s has been improved in the following three aspects: the network structure, feature fusion, and head of detection. The improved YOLOv5s is named DB-YOLOv5s. After a series of convolution calculation is carried out by putting the images into the improved DB-YOLOV5 network structure, DBCAM will sample and feature fusion the calculated results, and finally send the obtained results to DBDCM for the output of the results and the picture frame of the region. The schematic diagram of DBCAM feature fusion is shown in the lower right corner of Fig. 3. DBCAM fuses high-scale and low-scale feature maps and delivers the calculated results to DBDCM. DBDCM classifies the calculated results, draws the detection area according to the obtained information, and outputs the detection results.

C. Network Structure of BD-YOLOv5
The damage detection method for UAV not only needs to locate the damaged building region accurately but also is suitable to be implemented into an embedded device. Because of the weak computational power of embedded devices, such as UAVs, reasoning large convolution models takes a long time. Therefore, when using UAV to locate the damaged area of building, the model we need should achieve both high detection accuracy and small model size. To improve the accuracy, we increase the number of convolutional layers. However, with the increase of network layers, the real-time performance of CNN model will be seriously degraded. This is not conducive to model deployment in embedded systems. Meanwhile, the YOLOv5s model has few low-level feature maps and small receptive fields, which results in a lower recall rate and low accuracy of large targets To address this problem, the network structure of YOLOv5s is improved in our article. The whole network structure of BD-YOLOv5 is shown in Fig. 4. The detailed steps are as follows.
1) Since there are only 32-dimensional features in the low-level feature extraction module, the RestNet is utilized to extract the low-level feature, the width of the YOLOv5s is increased, and its Focus module is removed, which can enhance the expressive ability of low-level feature. 2) Since the receptive field of low-level features of YOLOv5s is small, the local detail information is lost and the accuracy of bounding box regression is low. Res-DConv [27] is proposed for enlarging the receptive field. As shown in Fig. 5, the dilatation rate of Res-DConv is 3, and it is equivalent to the receptive field of four layers convolution, which is only the half computation of four layers convolution. 3) Since the size of anchor is constrained by the feature receptive field and the downsampling times, K-means is used to determine the approximate range of the prior bounding box according to the height and width of damaged building region, and then the prior bounding box is classified according to the presetting range of bounding box. In fact, the size of bounding box is 30 × 30 according to the resolution of UAV images and the area of building; hence, the factor of downsample is 4, the maximum receptive field is 255, and the prior bounding box range is from 32 × 32 to 85 × 85.

D. Feature Fusion Based on Attention Mechanism
Since the flying height of UAV is constantly changing, it is difficult to obtain the same initial resolution images during the processing of UAV flying. Therefore, the feature fusion of images at different scales might improve the performance of the damaged building detection network. PANet of YOLOv5s using upsampling and downsample yielded good results for multiscale fusion, but the calculative scale is very great. In order to balance the accuracy of detection and the computational cost, an attention module (BDCAM) [28] for damaged building regions is proposed in our article. The key idea behind BDCAM is that the spatial attention in a low-resolution feature map is utilized to screen the weight of high-resolution feature map, which can improve the performance of object detection. As shown in Fig. 3, the implementation details of BDCAM are the following three steps.
1) The mask attention feature map: The low-level feature map F l ∈ R C×H×W , where W is the width of the feature map, H is the length of the feature map, C represents the number of channels, the mask attention feature map M a ∈ R C×1×1 is calculated with the following formula: where F c avg and F c max represents the feature maps of average and maximize pooling, D c represent the hybrid dilated convolution, and δ is the sigmoid activation function.
2) Down transform: As shown in Fig. 5, the high-level feature where H and W are the length and width of the high-level feature map, and C is the number of channels.
3) The feature map of channel self-attention: The feature map of channel self-attention is acquired by multiplying the mask attention feature map M a with the channel feature map, as shown in the following equation: 4) The feature fusion: The low-level feature map and the feature map of channel self-attention are multiplied to concat the fused feature map, as shown in (3). The concat Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. module indicates the operation of tensor concatenation BDCAM combines the proposed spatial and channel attention, which is a module of lightweight attention mechanism. The advantages of BDCAM are as follows. The feature fusion of BDCAM was further optimized, which enhances the ability to represent multiscale information because of the different scales of damaged building regions. In addition, the attention mechanism of BDCAM can obtain information of cross-channel interactions, such as the dependence relationship between channels, to achieve performance improvement.

E. Head of Object Detection
As a classification network, the backbone cannot finish the positioning task as classifying and locating during object detection are contradictory processes. Thus, the head of detection is employed to be responsible for detecting the location and classification of the object using the feature maps extracted from backbone. The required feature for classification is invariant to object translation, scaling, and rotation, partially invariant to illumination changes, and robust to local geometric distortion, but location information changes with feature scale and translation.
Inspired by the pioneering work discussed in [29] and [30], the split decouped module (SDCM) is used to detect the damaged building regions, which can complete different tasks in different stages. Our SDCM splits classification and localization int fc-head (fully connected head) and conv-head (convolution head), respectively. Fc-head for classification has two fully connected layers, following the design in CSP-Convs [31].There are two output dimensions. The Conv-head (Convolution Head) for bounding box regression stacks K residual blocks [32] and inserts a nonlocal block. We use fc-head as a bounding box regressor because it uses fewer parameters and is effective. The object classification and the box regression, although sharing a common structure, use separate parameters.
Both conv-head and fc-head are jointly trained with CSP-Convs end-to-end. The overall loss is computed as follows: where W fc and W conv are the weights for fc-head and convhead, respectively. L fc , L conv , and L CSP−Convs are the losses for fc-head, conv-head, and CSP-conv, respectively.

V. EXPERIMENTAL RESULTS AND ANALYSIS
The experimental results are demonstrated from the viewpoint of qualitative, quantitative, and efficiency, which evaluates the performance of our proposed method for damaged building regions detection. Damaged detection images are subjectively shown by visual inspection. An objective evaluation method using the precision, recall, F1 score, the mAP, and the frames per second (FPS) for object detection is utilized to verify the effectiveness of the proposed method compared with the other methods. These functions are defined as follows: where TP represents the correct pixel number of damaged building regions extracted by the proposed method, FN represents the missing pixel number of damaged building regions extracted by the proposed method, and FP represents the error pixel number of damaged building regions extracted by the proposed method. What is more, selection and analysis of parameter were employed to further evaluate our proposed approach. The proposed method was trained on a server while the actual operations were performed on the embedded system. Therefore, the experimental environment in our article is divided into two parts: the server and the embedded system. For the server, we selected the Dell PowerEdge R730 based on the CentOS Linux 7 operating system. For the embedded system, a Raspberry Pi 4B and NVIDA Xavier NX, which are based on the Linux kernel operating system, were employed as the master board. Table II presents the specific parameters of the experimental environment.

A. Qualitative Evaluation
In order to verify the effectiveness of our method, the visual images measured the detection results of our method. As shown in Figs. 6 and 8, the performance of the proposed approach is demonstrated through the experimental results on the site of the Ludian and Beichuan earthquake ruins. The red square label marked the true positive regions; the blue square label marked the false negative regions; and the yellow square label marked the false positive regions.
The detection results of damaged building regions on the Ludian area are shown in Fig. 6. Most of the damaged areas were correctly detected. However, there may still exist a few falsely detected damaged areas. For instance, four image pairs of falsely detected areas are shown in Fig. 7. Two regions are false negative regions, and two regions are false positive regions. As shown in Fig. 7(a) and (b), a building area is changed to tent scene after a disaster, so this case cannot be identified as a damaged area by our method. As shown in Fig. 7(c) and (d), the buildings were only slightly damaged, but there are some rubble or debris on the ground, so this case can be identified as a damaged area by our method, but in fact damage does not happen in this case.
The detection results of damaged building regions on the Ludian area are shown in Fig. 8. Most of the damaged areas were correctly detected. But there may still exist a few falsely detected damaged areas. For instance, three image pairs of falsely detected areas are shown in Fig. 9. Fig. 9(a) shows that the roof is damaged, but the area of damaged region is small. Fig. 9(b) shows that the damaged region of building is very flat, so the proposed method determines this case to be undamaged, but in fact damage does happen in the example. Fig. 9(c) shows that there was a lot of debris or rubble in the grass, so the proposed method determines this case to be damaged, but in fact damage does not happen in the example.

B. Quantitative Evaluation
In this experiment, we computed four indices (Precision, recall, F1, and mAP) to further evaluate the performance of the proposed method, compared with the damaged building region detection method based on the changed detection and object detection.
First, four indices were compared with methods based on the YOLOv3 method in [18], the YOLOv4 method in [20], and the YOLOv5 method in [21]. It can be seen from Table III     that the detection results are better indices than all alternate objects detection method with damaged building regions. Since the damaged areas of building have many different types of morphologies, such as debris, spalling, and rubble pile, the proposed BDCBAM feature fusion based on the attention mechanism is more suitable for describing many different damage types of building, which demonstrates that the proposed method can achieve more accurate results.
Second, the changed detection methods have always been utilized for damaged building detection since it can obtain the location of the building in advance. Therefore, our experiment compared the proposed method with the other changed detection methods: the semantic scene change method in [10], the adopted CNN classification method in [33], and the improved CNN classification method in [34]. Table IV presents the detection accuracy of various methods. Tu et al. [10] utilize the traditional multifeature machine learning classifier to detect the damaged building areas from multitemporal remote sensing images, which had the higher detection accuracy, reaching 91.7% and 93.14%. However, artificial feature extraction is employed to construct the damaged region features; hence, the generalization ability of this kind of method is lower. Compared with the traditional supervise learning method, Duarte et al. [33] and Ma et al. [34] use the CNN methods to detect the damaged building area, and their accuracy of detection can reach about 91% and 92%, respectively. These methods yield satisfactory detection results, but they require the predisaster data as prior information, which is difficult to meet practical requirement. As shown from Table III, our method surpasses all the compared methods on datasets without the predisaster datasets, which is more satisfied with the practical demand of damage building assessment. Not only that, in Section V-C, we will introduce the advances of our method in terms of operational efficiency.

C. Operating Efficiency
After a disaster occurs, another requirement for damaged building region detection is the detection speed due to the demand for fast response and rescue actions. If the algorithm can run efficiently on the embedded platform, it will be more likely to meet the demands of practical applications. Table V presents the detection times of different approaches over different hardware platforms for building damage detection. YOLOv3, YOLOv4, YOLOv4-tiny, YOLOv5s, and our method achieve good results under parallel computation of GPU, but the experimental results demonstrate YOLOv5s and the proposed method have relatively higher running speed than the other existing methods on the Raspberry Pi and NVIDIA Xavier NX. Furthermore, the proposed method has the smallest complexity and the highest accuracy without the predisaster data. Therefore, taking detection accuracy and efficiency into consideration, our method will be more suitable for real-time damaged building region detection on the embedded platform.

D. Ablation Study
Ablation studies are of great importance for object detection research, which can assess the performance of the improvement works and the interaction between them. In this section, we will introduce the ablation studies to show the performance of each improved network model. The impact of each component is listed in Table VI. In the second row, the improved YOLOv5s refers to the model that improves the network structure of YOLOv5s, which was 0.73% higher than that of the original YOLOv5s. In the third row, the improved YOLOv5s refers to the model that increases some layers of new feature fusion, which was 1.93% higher than that of the original YOLOv5s. In the fourth row, the improved YOLOv5s refers to the model that adds a detection head to YOLOv5s, which was 2.12% higher than that of the original YOLOv5s. In the sixth row, when we applied three improvements to the original YOLOv5s, the precision and mAP values reached highest, but when we applied two improvements to the original YOLOv5s, the FPS values reached highest (the fifth row). From a comprehensive point of view, it showed that the proposed DB-YOLOv5 cannot only ensure the recognition accuracy but also realize the lightweight properties of the network effectively.
To sum up, the proposed method is employed to detect the location information of damaged building regions rather than the detailed information of individual damaged buildings. It is because that the emergency managers need to pay attention to the approximate region of the damaged building for humanitarian assistance work at an early stage of a disaster. The experimental results demonstrate that our proposed method has high accuracy and efficiency for damaged building area detection. Quite specifically, our method can be deployed on the different embedded platforms and detect the damaged building region when UAV is flying, which will better meet the demands of practical applications for fast response and humanitarian assistance work.

VI. CONCLUSION
In this article, a novel method has been presented to detect the damaged building regions using UAV images with the improved YOLOv5 and embedding system, which is noteworthy applicable for fast response and humanitarian assistance in the case of future disasters. First, residual dilated convolution module (Res-DConv) is employed to extract the spatial features, which can enhance the receptive filed. Then, a feature fusion model, namely DBSCAM, for enhancing the expressive ability of object feature is designed, which could improve the classification performance of detector. Finally, a double-head method, which has a fully connected head for classification and a head for bounding box regression, is proposed for the localization task. The experimental results show satisfactory results in the evaluation of UAV images used in Wenchuan and Ludian Earthquake building damage detection. It is a great potential for real-time damaged building damaged detection and assessment, especially when predisaster data are hard to capture. In the future, the remote sensing data from multiple platforms and sensors are quite different due to the different types and scales of data. Further studies on this model should aim to be flexible enough in order that different scales and data sources can be utilized to detect the building damage regions. Therefore, transfer learning and incremental learning are employed to deal with these challenges. This will be the direction of our efforts in the future. Yunlong Wang is currently working toward the bachelor's degree in electronic information engineering with the School of Electronic Information, Yangtze University, Jingzhou, China.
He joined the National Demonstration Center for Experimental Electrical and Electronic Education in 2020. His research interests include image processing and intelligent algorithm.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Wenqing Feng received the B.S. degree in geographic information system from Huazhong Agricultural University, Wuhan, China, in 2013, and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2020.
He is currently an Associate Professor with the School of Computer Science, Hangzhou Dianzi University, Hangzhou, China. His major research interests include object-based image analysis, change detection of remote sensing images, and deep learning.
Kun Jiang is currently working toward the bachelor's degree in electronic information engineering with the School of Electronic Information, Yangtze University, Jingzhou, China.
He joined the National Demonstration Center for Experimental Electrical and Electronic Education in 2021. His research interests include embedded system design and machine learning. Qianchun Li is currently working toward the bachelor's degree in electronic information engineering with the School of Electronic Information, Yangtze University, Jingzhou, China.
She joined the National Demonstration Center for Experimental Electrical and Electronic Education in 2021. Her research interests include machine learning and intelligent algorithm.
Ruipeng Lv received the M.S. degree in photogrammetry and remote sensing from the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China, in 2016.
He is currently a Senior Algorithms Engineer with NavInfo Co., Ltd., Beijing, China. His research interests include mobile mapping system, point cloud processing, and photogrammetry. Jihui Tu (Member, IEEE) received the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2017.
He is currently an Associate Professor with Electronics and Information School, Yangtze University, Jingzhou, China. His research interests include deep learning, computer vision, and natural language processing.