Forest-fire response system using deep-learning-based approaches with CCTV images and weather data

An effective forest-fire response is critical for minimizing the losses caused by forest fires. The purpose of this study is to construct a model for early fire detection and damage area estimation for response systems based on deep learning. First, a large-scale fire dataset with approximately 400,000 images is used to train and test object-detection models. The optimal backbone for the faster region-based convolutional neural network (Faster R-CNN) model is determined using a DetNAS-based architecture search algorithm. Then, the searched light-weight backbone is compared with well-known backbones, such as ResNet, VoVNet, and FBNetV3. In addition, data pertaining to six years of historical forest fire events are employed to estimate the damaged area. Subsequently, a weather API is used to match the recorded events. A Bayesian neural network (BNN) model is used as a regression model to estimate the damaged area. Additionally, the trained model is compared with other widely used regression models, such as decision trees and neural networks. The Faster R-CNN with a searched backbone achieves a mean average precision of 27.9 on 40,000 testing images, outperforming existing backbones. Compared with other regression models, the BNN estimates the damage area with less error and increased generalization. Thus, both proposed models demonstrate their robustness and suitability for implementation in real-world systems.


I. INTRODUCTION
F ORESTS contribute to significant ecological and economic functions in ecosystems [1]. In addition, forests are important heritage sites for human beings. However, forest fires can cause tremendous damage to human life and property and adversely affect forest ecosystems in the long term. Therefore, forest fires must be prevented; however, they are difficult to prevent because of their diverse causes. According to the Republic of Korea's Forest Fire Statistics Yearbook [2], the risk of forest fires increased in 2020 owing to an increase in the number of dry days and a considerable decrease in the number of precipitation days. A total of 620 forest fires occurred in 2020, causing damage over an area of 2919.84 hectares. The number of forest fires increased by 31% in the last decade (474 cases), and the area of forest being damaged increased by 161% (1120 hectares). The initial response to forest fires is an important factor in reducing accidents. Thus, comprehensive real-time monitoring and damage assessment are necessary for minimizing forest-firerelated losses.
Owing to the advancement of deep-learning-based vision approaches, the limitations of traditional methods for classification [3]- [7], object detection [8]- [14], and segmentation [15]- [17] can be overcome. Object-detection models that exhibit high performance can be categorized into two types, based on the model structure. The first type is one-stage detection models, such as the You Only Look Once (YOLO) series [18], single-shot detectors [19], and RetinaNet [20]. The second type is two-stage detection models, such as faster region-based convolutional neural networks (Faster R-CNNs) [21], DenseNet [22], and Mask R-CNN [23]. In one-stage models, regional proposals and classifications are performed simultaneously in one step. For example, some researchers [24] used YOLOv3 [25] with unmanned aerial vehicle (UAV) images to detect fires and smoke. This model achieved a recognition rate and frame rate of approximately 8% and 3.2 frames per second (fps), respectively. Other researchers [26] combined two one-stage models, Yolov5 and EfficientDet [9], for fire detection. The ensemble model produced an average precision (AP) of 0.79 and latency of 66.8 ms. However, these single-stage models were insufficient for detecting forest fires covering large areas. Therefore, we adopt two-stage models in which regional proposal and classification are performed sequentially in two steps. Two-stage models, such as ATT Squeeze U-Net [11], which was recently developed, obtained an accuracy of 0.93 and detection frame rate of 1.1 fps (0.89 s per image). In addition, the Faster R-CNN was used with three different base networks, AlexNet, VGG16, and ResNet101 [8], for forestfire detection. Researchers [27] used the Faster R-CNN with data from a real monitoring system; the results showed that the trained model can achieve an F1-score of approximately 80%. Recently, an efficient convolutional neural network (CNN) architecture was developed for classifying input images into eight distinct fire-scenario classes [28]. However, these models have several limitations.

1) Nonreasonable Hyperparameters: The original Faster
R-CNN models based on various backbones, such as ResNet [29] and VoVNet [30], have a problem with training hyperparameters set based on the experiences and intuitions of the researchers. This negatively affects the accuracy and speed of the model. 2) The searched backbone, such as FBNetV3 [31], may fall into a sub-optimal point when trained on an objectdetection dataset. 3) Low Inference Rate: Two-stage detection models have an essential problem with slowness. For a real-time forest-fire detection system, both the detection accuracy and rate are important.
In this study, DetNAS [32] is used to search for an optimal backbone for the smoke and forest-fire detection model. The NAS-based [33] searching backbone algorithm is considered as a strong tool for finding an optimal CNN architecture suitable for domain datasets. ShuffleNet V2 [34], an efficient light-weight architecture, is used for searchable components. A publicly available large-scale fire dataset of approximately 350,000 images is used for training. The searched backbone is evaluated on approximately 40,000 images and compared with the original Faster R-CNN models using the well-known ResNet [29], light-weight VoVNet [30] and NARS-based FBNetV3 backbones [31].
Estimating the damage area is critical for providing the appropriate response. According to [2], forest-fire response scenarios can be of four levels. The first is the early response scenario, which represents the case when the estimated damage area is less than 10 hectares, fire duration is less than 3 hours, and wind speed is less than 2 m/s. In this case, local authorities utilize 50% of their firefighters and equipment and 100% of their helicopters. The second step is the scenario in which the estimated burned area is between 10 and 30 hectares, duration is between 3 and 8 hours, and wind speed is between 2 and 4 m/s. In this scenario, all firefighters, equipment, and helicopters are used, along with 50% of the helicopter and drone capabilities from neighboring provinces, boosting the response performance. The third is the scenario in which the estimated damage area is between 30 and 100 hectares, duration is between 8 and 24 hours, and wind speed is between 4 and 7 m/s. In this case, 50% of firefighting personnel, 30% of equipment, and 100% of helicopter and drone resources from neighboring provinces are used. The fourth is the scenario in which the estimated burned area exceeds 100 hectares, duration exceeds 24 hours, and wind speed is greater than 7 m/s. In this case, 50% of equipment is acquired from surrounding provinces, helicopters are acquired from metropolitan areas, and all other facilities are acquired from local governments.
Fire characteristics that are influenced by environmental conditions, such as wind speed, air temperature, humidity, ground temperature, and atmospheric pressure, affect the burned area during a forest fire. Thus, understanding the relationship between the damaged area, its components, and the available manpower factors (firefighter and helicopter involvement) is critical for providing the best possible response to these natural disasters. To account for the interdisciplinary domain knowledge necessary to estimate the damage area, model frameworks that can consider the interdependencies between the processes involved are required. Bayesian neural networks (BNNs) are ideal for combining multidisciplinary models [35]. Bayesian networks (BNs) were applied in previous studies on tsunamis [36] and rock fall hazards [35]. Researchers [37] used a BNN to predict and assess wildfire occurrences and burn severity and modeled the wildfire spread [38], wildfire ecological consequences [39], and risk of human fatality from fires in buildings [40]. In 2017, an approach that utilized BNs for wildfire economic losses was presented [41]. BNs were recently used to forecast and analyze the causes of forest fires [42]. BNs have been employed in various studies. Owing to the complexity and the involvement of many factors in forest fires, assessing forest-fire damage is currently problematic. In this study, historical forest-fire data were used to establish a baseline for implementing a BNN. Weather and response variables were included in the input data. The BNN algorithm was also evaluated and compared with other machine-learning algorithms.
To summarize, the main contributions of this study are as follows: 1) Propose an approach for forest-fire response systems by using deep learning. 2) Search for optimal light-weight backbones on a largescale fire and smoke dataset. 3) Implement a BNN for estimating the damage area using six years of forest-fire records and weather data. The remainder of this paper is organized as follows. Section II provides an outline of the theoretical background. Section III explains the proposed system. Section IV describes our experiments. Section V presents a summary of the findings. Finally, Section VI highlights the outcomes.

II. THEORETICAL BACKGROUND A. DETNAS-BASED OBJECT DETECTION
The method for detecting forest fires using closed-circuit television (CCTV) footage is primarily based on object detection. Figure 1 shows the architecture of the Faster R-CNN. The model receives an image as the input, and the image then passes through the backbone of the model. After convolution in the backbone, the feature maps of the image are extracted. Then, a region proposal network (RPN) is applied to the feature maps. The output values of the RPN are proposals, such as the red boxes in the RPN. These proposals and feature maps from the backbone are used to perform a region-of-interest pooling to create a fully connected layer. Finally, object detection is completed using softmax and bounding box regression. Object detectors rely heavily on the backbone [10]. However, the backbone, which is handcrafted and optimized for image classification datasets, is frequently applied directly to the object detection model. This operation may have a negative impact on performance [43]. Recent advances in deep learning have resulted in state-ofthe-art models in various fields by using neural architecture search (NAS) [33]. NAS was intended to address the issues associated with early models that occur when researchers build models based on their experience. NAS enabled the creation of a model architecture without human interaction. Many studies on backbones and hyperparameters were conducted to improve the performance of object detection models. In the case of MobileNetV2 (Fig. 2.(a)), both the backbone network and parameters were determined by many experiments, not by NAS. Models based on human experience are potentially problematic in terms of rate and accuracy. In the case of FBNet-C ( Fig. 2.(b)), deep learning creates the backbone network, but the hyperparameters are (3 ,6) (c) FBNetV3 (5 ,5.46) (-,6) (3 ,1) ( (k ,e) = (kernel size , expansion) skip FIGURE 2. Visualization of some backbone networks. The rectangular boxes represent blocks for each layer. We used two colors to represent the kernel size of each layer; orange -3, purple -5, and blank indicates a skip [31], [44].
also determined experimentally. However, in the case of  [31]. Numerous studies were conducted for automatically determining backbones, but a limits exist as to what can be applied for forest fire detection, including FBNetV3. The majority of previous studies attempted to identify a backbone network for image classification by using ImageNet, a dataset for image classification. They then attempted to apply these imageclassification backbones to an object detection task. Because of this mismatch, performance is limited. DetNAS [43] is the first study to investigate NAS for determining the object detection backbone. The researchers applied a single path one-shot method [45] for NAS and split the search process into 3-steps: supernet pretraining, supernet fine-tuning, and an evolution search on the trained supernet. Figure 3 shows each step in NAS for searching for a forest fire detection backbone. First, supernet pretraining is determining the supernet, which is a set of many subnets. The supernet is trained using ImageNet. In this step, a path-wise method is applied for ensuring the relationship between the supernet and subnets. Second, the supernet, which includes a head and metrics, is fine-tuned. This supernet fine-tuning is trained using the coco-dataset, which is an object detection dataset. To customize the backbone, we must change the detection dataset and hyperparameters. Third, determine one subnet from the supernet. During this search, an evolutionary algorithm is used to select a candidate in the supernet.

B. BAYESIAN NEURAL NETWORK
As discussed earlier, the final aim was to build a model capable of estimating the damage area of forest fires based on current weather conditions and historical weather records. Therefore, the damaged areas were divided into classes, each VOLUME   with increasing values. Consequently, the problem can be classified as a regression problem from a machine-learning perspective. An artificial neural network (ANN) is a powerful tool for classification and regression problems [46]. This point-estimate approach, on the contrary, tends to lack reasonability and may generalize in an unexpected and overconfident manner on data points from outside the training distribution [47]. As mentioned earlier, BNNs are widely used in risk-assessment fields. A BNN is a common type of stochastic neural network, which can better understand the uncertainty of underlying processes [48]. Figure 4 shows the distinction between a point-estimate neural network and BNN, that is, instead of adopting fixed numbers as model weights, a distribution represents each weight in the BNN. A BNN can be summarized as follows: where θ represents the model parameters, and ε represents random noise. The following step selects a neural network architecture. p(θ) and p(y|x, θ) are a prior distribution over the possible model parameters, and the prior confidence in the predictive power of the model must be selected. Thus, the Bayesian posterior can be written as where D, D x , and D y are the training set, training features, and training label, respectively. In practice, computing this distribution is typically difficult. Thus, the sampling method (Markov chain Monte Carlo) and variational inference approach are applied as approximation methods. Refer to this reference for a more detailed discussion of these techniques [48]. The TensorFlow probabilistic deep-learning framework is used in this study to implement a BNN.

III. PROPOSED METHOD
The four phases of forest-fire management [49] are mitigation, preparation, response, and recovery. The focus of this study is on the implementation of a forest-fire response system with two primary aspects: detection and damage estimation. As shown in Figure 5, if smoke or fire occurs, AI-powered CCTV cameras detect smoke and fire in the forest in real time and then send the results to a database server. Subsequently, the server sends a UAV to the forest fire location to scan for damage. Then, the regression model gathers data on the extent of the damage area and weather data from the fire location to estimate and visualize it in an integrated system.
The most critical aspect of a forest-fire response system is the rapid and accurate detection. To achieve this, we propose a novel forest fire detection backbone network derived from NAS. Previous research on forest fire detection has relied exclusively on various object detection models that perform well on the COCO dataset [50], not on a forest fire detection dataset. Because our backbone is tailored to the large-scale forest fire dataset, our detection model with a searched backbone outperforms other fire detection models. Additionally, ShuffleNetV2 block is used as a searchable component for searching for the backbone. Therefore, the searched backbone can be considered as a light-weight model capable of real-time inference and deployable on edge devices [34].
After a forest fire is detected early, the next step is to estimate the damage area in real time. To estimate the damage area, a BNN-based regression model is constructed using historical forest fire records and weather data. BNNs are well-4 VOLUME 4, 2016 known for handling uncertainty in datasets. Consequently, the trained model can generate a distribution representing the probability of the damage area. Additionally, using a UAV and segmentation algorithm, we could estimate the total damage area and create a 3D forest fire damage map using our previous research [17].

IV. EXPERIMENTS
In this study, we proposed to perform forest fire detection and damage area estimation to establish a forest-fire response system. We conducted two experiments, based on each proposal. For forest fire detection, we prepared a fire image dataset for training. In this process, we analyzed and preprocessed the training dataset to solve the unbalancing problem between classes. The DetNAS algorithm was then used to search for an optimal backbone. In addition, we compared our model with ResNet, VoVNet, and NAS-based FBNetV3 models. For damage area estimation, we matched the weather API to collect the time-series numerical dataset. Subsequently, we trained and validated our BNN-based damage area estimation model.

A. FOREST FIRE DETECTION
Dataset The AIHub 1 fire detection dataset was used in the experiments. As shown in Figure 6a), 349774 images were used to train the model, and the remaining 39243 images were used to evaluate its performance. The number of instances per class from the training and testing datasets is shown in Fig. 6b). This dataset was divided into four distinct categories: black smoke, gray smoke, white smoke, and fire. Figure 7 depicts the data sample with its corresponding classes. The data were created in a naturalistic environment, near a mountain, similar to a forest fire. Consequently, this dataset was well-suited for developing a model for the early detection of forest fires using CCTV data. Preprocessing We employed image augmentation techniques, such as random rotation, vertical and horizontal flipping, and their associated labeling for two reasons. First was the irregularity in the smoke shape. Figure 7) shows smoke with various shapes. Unlike fixed-shape objects, such as people and cars, smoke adopts various shapes and directions. Therefore, image augmentation can be used effectively for training data augmentation because smoke is not a fixedshape object. Second was the unbalanced distribution of the training dataset for each class. Figure 6.(b) shows that the number of instances for each class is uneven. To solve this, we performed image augmentation with different numbers of instances depending on each class. Using image augmentation in this manner improves the reasonability of the detection model. Metrics The COCO AP (average precision) was used to evaluate the model. The COCO AP is computed from the precision-recall (PR) curve based on true positive (TP), false positive (FP), and false negative (FN) results. Because our 1 https://aihub.or.kr/aidata/34121 goal is to detect fire and not classify the colors of smoke or fire, we use the mean average precision (mAP), which is the most commonly used object detection model metric [10]. Implementation details For the model architecture, we used the Faster R-CNN model with ResNet, VoVNet, FBNetV3 and our model with the searched backbone. The network with a feature pyramid network (FPN) attached to its name has a modified structure that creates feature maps step-bystep in the existing network layer, combines features in topto-bottom manner, and proceeds with object detection. For training configurations, the batch size was 16, number of iterations was 10K, and initial learning rate was 0.15. The input image was resized to 512 × 512 pixels without changing the aspect ratio using bilinear interpolation.
In our experiment for NAS, as shown in Figure 3, the ImageNet dataset was used in step 1, nearly 350000 images were used to fine-tune the supernet, and 10000 images were used to search the subnet. We used an evolution algorithm to determine the best subnet from the candidates. During the evolutionary search, we used a candidate pool size of 50 and mutation number of 20. After determining the subnet, we retrained the Faster R-CNN model using the subnet on ImageNet and fine-tuned on 350000 forest fire images. All searching processes ran based on the MMRazor Opensource library [51]. In all steps, we used a 512 × 512 image-size to compare backbones. The input image was normalized and augmented with a random flip (p=0.5). Stochastic gradient descent, using a learning rate (lr=0.001), was used as an optimizer. Other configurations were adopted from the original research [43]. The models were implemented in Pytorch and ran on CentOS Linux 8 with two NVIDIA Tesla V100 graphics processor units, each with 32 GB of memory. Approximately 8 days were required for the two GPUs to train and fine-tune all DetNAS processes.

B. DAMAGE AREA ESTIMATION
Dataset In this study, we used three datasets from the Korea Forest Service Organization [2]. The first is a forest fire outbreak dataset that includes a record of 2,128 forest fires from 2014 to 2020. The second is a topographic dataset, and the last is a mountain weather dataset from the 365weather observation source API. Because these three datasets have different observation points, preprocessing should be implemented. Figure 8 shows the preprocessing. First, we discarded the rows with NaN and then integrated the three datasets based on the distances between the closest observation points. If the nearest event distance is less than 10 km, we regard it as having the same event point. After filtering, for the integration of fire outbreaks and weather data, we averaged the weather data based on the fire outbreak time and duration. We used 11 variables, as shown in Figure 9, namely, the month, duration, number of helicopters involved, number of firefighters, temperature, surface temperature, pressure, humidity, wind speed, altitude, and daily weather index (DWI). VOLUME 4, 2016 5 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.   The raw dataset contains all categories with the corresponding ranges shown in Table 1. This procedure was performed in accordance with the corresponding variable distribution and guideline from the forest-fire response level [2]. Figure  10 shows the number of instances for each variable. The sample distribution for the damage area class is uneven. Only a few samples are present in classes 8, 9, and 10, which correspond to significant forest-fire losses. The SMOTE [52] approach was used to oversample the minority class.
Metrics The mean square error (MSE) and root mean square error (RMSE) were chosen as the loss and training metrics, respectively. The k-fold cross-validation technique was used to evaluate the model performance. Implementation details For the decision tree, the maximum depth was specified such that the tree could increase indefinitely until all leaves were pure. For the ANN, the input value was initially normalized using a batch normalization layer, followed by two hidden layers with eight hidden units; the model was then trained using an optimizer using root mean square propagation (RMSprop) with a learning rate of 2 × 10 −3 . The standard BNN architecture was used in this study. The Gaussian prior with mean 0 and diagonal covariance θ 2 I on parameter θ of the network was used as follows: A Gaussian prior was used to select the Gaussian approximate posterior. After the input value was normalized by using a batch normalization layer, two dense variational layers with eight hidden units were created. The training model was optimized using an RMSprop optimizer with a learning rate of 0.01. Figure 11 shows the mAP curve of the search for the subnet in the trained supernet. We used 20 epochs and 50 candidates for each epoch. The highest candidate score in each epoch 6 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and   increased as the evolutionary search proceeded. From this, the best architecture performance was selected. Figure 12 shows the selected architecture. The searched backbone is a combination of ShuffleBlock, which is a unit of ShuffleNetv2 [53]. The ShuffleBlock is a light-weight block based on the inference rate, not the FLOPs. Therefore, the ShuffleNetv2 shows the best performance in light-weight object detection models, such as MobileNetv2, DenseNet, Xception, NAS-Net. In this study, we define the search space using shuf-fleBlock. Each layer of our search space is a ShuffleBlock with different kernel sizes or skip layers. As shown in Figure  12, in early stage 1, the searched backbone exhibits a large kernel, and at the middle and final stages, a small kernel remains. This pattern indicates the feature extraction, the important part of object detection. As much information as possible must be extracted from input images to create a pattern useable for detection.  The test data contain nearly 40,000 forest fire images. Our searched backbone shows the highest mAP score for lightweight backbone networks. This shows that our searched backbone specialized in smoke or fire detection. That is, we avoided the limit on the accuracy of previous models using NAS. Figure 13 shows the inference images and ground truth images from each prediction image using our searched backbone. Most prediction images are similar with the ground truth, but Figure 14 presents some exception cases. Figure 14 shows why our model has a limited mAP score. As shown in the bounding boxes of the prediction images, in one image, the model can detect many classes, because of the visual similarities of smoke. The different prediction results in one object can lower the mAP score, which is affected by the false positive rate. However, the intention is to detect the risk factors of forest fires. From this perspective, our model can detect the risk factors well for a forest-fire response system despite its low mAP score.

B. DAMAGE AREA ESTIMATION USING WEATHER DATA
With the BNN, the model produces a conditional probability instead of a point-estimate prediction, from which an optimal estimate can be retrieved. Compared to other algorithms, the trained model predicts over 300 iterations and then averages the results. As illustrated in Figure 15, the decision trees with an RMSE of approximately 2.6 for each fold indicate that the model is likely to overfit the training set and provide incorrect predictions on the test set. This issue arises because of the lack of training data and inherent uncertainty in capturing forest fire history. Thus, while the ANN and BNN can achieve better results, approximately 1.7 RMSE for each fold, they still struggle with overfitting. As discussed earlier, whether VOLUME 4, 2016 7 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and   the BNN has a higher accuracy or can improve performance is unclear, but it can deal with uncertainty in datasets, partic- ularly in small datasets, similar to our approach. To evaluate the proposed approach, a real-world forest fire occurrence -2020 March 19 forest fire at Ulsan, South Korea -was used. According to the records, the estimated damage area of this fire was approximately 519 ha. As shown in Figure 16, the damaged region can be approximated using a distribution. This results in the following four levels of forest fire response: initial level -0.6%, level 1 -10%, level 2 -34.69%, and level 3 -54.7%. This allows for the deployment 8 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. of the relevant response scenario.

VI. CONCLUSION
An appropriate forest-fire response is critical for mitigating losses and providing authorities with an effective solution. The first two stages of a forest-fire response system are early fire detection and damage area estimation. Because of the advantages of the DetNAS-based searching backbone algorithm for object detection models, the searched backbone outperform existing backbones: from hand-craft backbones, such as ResNet and light-weight VoVNet, to NAS-based FBNetV3. With an acceptable mAP of 27.9, smoke type and fire can be detected. In addition, ShuffleNetV2 blocks are considered as light-weight and effective backbones for real-time object detection. Owing to these characteristics, the searched backbone can be implemented on real-time monitoring systems. Furthermore, the damaged area can be assessed in real time using a BNN model and weather data. As illustrated in Figure 17, a web-based visualization plat-form was created, and weather data were updated in real time using a weather station API. When a forest fire occurs and is detected using an early fire detection model, the damaged area is approximated using the current state of the forest.