Efficient Video Fire Detection Exploiting Motion-Flicker-Based Dynamic Features and Deep Static Features

Since fire is one of the most serious types of accidents that can occur, there is always a need for improvement in fire detection capabilities. Convolutional neural networks (CNNs) have been used for a variety of high-performance computer vision tasks. The use of CNNs to extract deep static features of fire has greatly improved the accuracy of fire detection. However, the implementation of CNNs in the real world is limited by their high computational cost. In addition, fire detection methods based on the classification of images alone using CNNs cannot account for the dynamic features of fire. Therefore, in this paper, a method that exploits both motion-flicker-based dynamic features and deep static features is proposed for video fire detection. First, dynamic features are extracted by analyzing the differences in motion and flicker features between fire and other objects in videos. Second, an adaptive lightweight convolutional neural network (AL-CNN) is proposed to extract the deep static features of fire. Finally, the dynamic and static features of fire are combined to establish a video fire detection method with improved operational efficiency in terms of accuracy and run time. To prove the validity of our method, its accuracy and run time are evaluated on three test datasets, and the results reveal that our method exhibits better performance than state-of-the-art methods. Moreover, our method is shown to be feasible in complex video scenarios and for devices with resource constraints.


I. INTRODUCTION
Fire is one of the most dangerous types of disasters, threatening human life and property, the ecological environment, and infrastructure. Reducing the damage caused by fire has important theoretical and practical significance [1], [2]. With the increasing popularity of video surveillance equipment and the development of computer vision techniques, video fire detection methods based on fire features have attracted widespread attention from researchers [3]- [5].
The features of fire can be divided into static features and dynamic features. Static features include spectral information and spatial structure information, such as brightness, color, texture, and edges. Dynamic features include the overall The associate editor coordinating the review of this manuscript and approving it for publication was Naveed Akhtar . motion features and random motion features, such as motion and flicker features [6]. Early methods of fire detection usually identify fire on the basis of one or more of these features, such as methods based on the construction of color models using static features, including RGB, HIS, and YCbCr features [7]- [9]. In addition, methods involving the combination of multiple color models have been applied for fire detection. Zaidi et al. performed video fire detection based on RGB and YCbCr features by setting thresholds [10]. However, colorbased fire detection methods are often susceptible to a variety of environmental factors, such as sunlight, other light sources, and red or orange objects, which can lead to high false alarm rates.
To overcome this susceptibility, researchers have conducted further investigations by combining color features, shape features, dynamic features, etc. Seebamrungsat et al. performed fire detection by combining multiple feature rules, considering the HSV, YCbCr, and interframe features of fire [11]. Lascio et al. combined color and motion features in an expert system for fire detection based on the analysis of surveillance videos [12]. Marbach et al. analyzed video frame sequences. The features of video sequences were extracted and used to determine whether a fire had occurred [13]. Chen et al. used a background detection method to obtain the moving areas associated with fire and smoke in videos and then determined the color features of these moving areas to identify the presence of a fire [8]. Foggia et al. used an expert system to build a rule set based on fire color, shape, and motion features, which offered improved accuracy but also suffered from a high false alarm rate [6]. Yan et al. extracted multiple features for forest fire recognition, including color, texture, area, and shape features [14]. Kosmas et al. built an SVM classifier for fire detection based on motion features, texture features, flicker features, and color probability features [15]. Toreyin et al. carried out a series of studies on fire discrimination and successfully used a hidden Markov model to realize the real-time detection of fire in videos [16]. The researchers whose methods are reviewed above built their own extractors to improve the accuracy of fire detection. Such ''hand-crafted'' dynamic features, for example, motion and flicker features, have promoted the development of video fire detection. However, motion detection or flicker frequency analysis alone is insufficient to effectively extract dynamic features. In addition, because of the high complexity of fire scenes in videos, artificially designed static features are highly redundant. The intelligent extraction of as many deep static features as possible is impossible. However, a deep neural network can effectively extract the deep static features of an image through automatic learning, which can help to improve performance.
Hinton et al. proposed the theory of deep learning in 2006 [17]. Deep learning involves extracting high-level abstract features of data through nonlinear expressions and building mathematical models to achieve improved classification and detection accuracy; hence, it has become a popular area of research in the artificial intelligence community. In recent years, a large number of neural network models have been proposed, such as convolutional neural networks (CNNs) [18], recurrent neural networks (RNNs) [19], and deep belief networks (DBNs) [20]. These networks have been used for a variety of high-performance computer vision tasks, such as image processing [18], [21], object detection [22], natural language processing [23], speech recognition and other applications [24]- [27]. Among them, CNNs have achieved superior results in image classification.
More recently, many methods using neural network algorithms to extract the static features of fire have been applied for fire detection. Frizzi et al. proposed a CNN-based method for fire and smoke detection and tested it on video sequences [28]. Sharma et al. used higher-performing network models, i.e., VGG16 and ResNet50, for fire detection [29]. Shen et al. used the popular YOLO network framework for fire detection and compared the results with those of shallow learning methods to prove the effectiveness of deep learning [30]. Hu et al. proposed a long-period neural network model and an optical flow method for the real-time detection of fire and smoke [31]. Zhang et al. jointly trained a CNN on complete images and image blocks for the detection and localization of fire in an image [32]. Muhammad et al. proposed a CNN-based early fire recognition method for early real-time fire detection in surveillance videos and established a more efficient CNN-based fire detection framework based on SqueezeNet [33]. In addition, Muhammad et al. conducted further research and established a fire detection framework combined with 5G network transmission to achieve fire detection in uncertain environments [34].
Although these works using CNNs are notable, they do not take advantage of the dynamic features of fire. In addition, CNN models face challenges in terms of popularization because of their high memory consumption. Furthermore, the accuracy of fire detection still requires improvement due to its critical importance for disaster management. Moreover, achieving high robustness of deep learning models for video fire detection in complex video scenarios remains challenging. The main contributions of our work are summarized below.

1) An efficient video fire detection method is proposed
that exploits both motion-flicker-based dynamic features and deep static features to achieve improved performance in terms of its accuracy and false alarm rate. In addition, experiments prove that our method can be applied to a variety of complex video scenarios. 2) Our method considers both motion and flicker features, which is helpful for more effectively extracting dynamic features while reducing time consumption. 3) Our method uses an adaptive lightweight CNN to extract the deep static features of fire, which can reduce the computational burden while avoiding the loss of image features caused by fixed-size image input.
The remainder of this article is structured as follows. The proposed method is presented in Section II, including a detailed introduction to the dynamic and static feature acquisition methods. In Section III, the hyperparameter settings and experimental dataset, the evaluation metrics, and the experimental results are described in detail. The results of this paper are discussed in Section IV. Finally, the conclusion and plans for future work are presented in Section V.

II. THE PROPOSED METHOD A. ACQUISITION OF DYNAMIC FEATURES
As a nonrigid moving object, a fire has obvious dynamic features in a video [35]. To make full use of these dynamic features, a motion-flicker-based algorithm that considers both motion and flicker features is proposed for the acquisition of dynamic features, inspired by the work of Chen et al. [36]. This algorithm includes background subtraction and flicker VOLUME 8, 2020 detection. First, background subtraction is applied to extract motion features, which are often used for the extraction of moving areas in videos [37]. Research has shown that the KNN-based approach offers desirable performance in outdoor scenarios [38], [39]. Similarly, this approach is suitable for the background subtraction of fire. Video stream processing and background subtraction are implemented through OpenCV, which is an open-source algorithm library [40]. A moving area in a video extracted through background detection is called a suspected region of interest in our method. The second step is flicker detection. A fire will produce a disordered continuous high-frequency time series of changes relative to ordinary objects due to the combustion process [41], [42]. These changes manifest as flicker or pulsations, which are called the flicker features of fire. This step can determine whether a suspected region of interest exhibits flicker features; if so, the moving area is considered to have the dynamic features of fire and is called a region of interest. The overall algorithm is explained as follows.
1) Obtain the coordinate position (x, y, w, h) of each suspected region of interest in the current video frame based on background subtraction, where (x, y) represents the coordinates of the upper-left corner of the suspected region of interest and (w, h) represents the width and height.
2) Create a pixel frequency matrix SUM of the same size as each suspected region of interest, which will be used to analyze the brightness changes of each pixel, with coordinates (x, y), in the moving area. The brightness calculation method is shown in equation (1). The equal-weighted average of the three channels is used to avoid floating-point calculations to reduce the number of calculations required [36].
where I t represents the pixel brightness value at time t; R t , G t , and B t represent the pixel value in each band at time t; and (x, y) represents the coordinates of the pixel in the image.
3) If the brightness value of the pixel at (x, y) changes between time t and time t-1, the value of the corresponding element in the frequency matrix, SUM t (x, y), is increased by 1, whereas otherwise, it is increased by 0, as shown in equation (2). where where I(x, y) represents the change in brightness at (x, y) between time t and time t-1 and T I is a positive real number that represents the global change threshold. 4) If the oscillation count for a pixel within a certain time exceeds a set threshold, that pixel is considered to have a fire flicker feature, as shown in equation (4).
where n is the specified counting period, the length of which is set to 3, and the interval between counting periods is set to 1. SUM T is the dynamic flicker threshold. With these settings, if there is at least one above-threshold brightness difference between three consecutive frames of video at the same pixel coordinates, this pixel is considered to have a flicker feature. 5) The final regions of interest are determined on the basis of a threshold λ, as shown in equation (5).
where T f is the number of pixels satisfying equation (4) in the candidate fire region and T rect is the total number of pixels in the candidate fire region. λ is an experimental threshold. Finally, any area that satisfies equation (5) is identified as a region of interest.

B. ACQUISITION OF DEEP STATIC FEATURES
To extract the deep static features of fire, we propose an adaptive lightweight convolutional neural network (AL-CNN), as shown in Fig. 1. The core of the lightweight network is a deep separable convolution structure, which realizes the separate mapping of channels and regions and reduces the required number of parameters and memory consumption. The AL-CNN consists of three parts: a network initialization stage, an inverted residual block stage and a spatial pyramid pooling stage. The first part is the network initialization stage, as shown in Fig. 1 (A), which consists of three modules: a convolutional layer, a batch normalization (BN) layer and a hard version of the swish (h-swish) activation function. BN is a neural network training optimization method proposed by Google [43]. It has been widely used in neural networks to accelerate network convergence and improve the stability of training. The h-swish activation function draws on the latest achievements of MobileNetV3; it offers increased accuracy while ensuring a low computational cost [44], [45]. During training, the network initialization stage can enhance the ability of the network to learn sparse features and improve the robustness of the extraction of deep static features.
The second part is the inverted residual block stage, which is inspired by MobileNetV2 [46] and consists of two types of components: inverted residual blocks and downsampling blocks. Each inverted residual block consists of three steps, as shown in Fig. 1 (B1). First, the dimensionality is expanded by a 1 × 1 convolution, as a deep convolution itself does not have the ability to change the number of channels. Then, image features are extracted through depthwise separable convolutions. Finally, multiple features are obtained through shortcut connections. The structure of a downsampling block is shown in Fig. 1 (B2). The purpose of downsampling is achieved by setting the stride to 2. The inverted residual block and the downsampling block have the same structure, except for the shortcut connection and the stride. The inverted residual block stage enables us to extract the deep static features of fire.  [47]. It can enable a CNN to process images of any scale while avoiding the loss of static features caused by cropping and warping operations; this is why we call our network an adaptive network. In addition, the maximum pooling function is used to suppress local noise and improve the accuracy of target recognition. Adding the SPP structure at the end of the network avoids the need for fixed-size image input and improves the ability of the network to detect fire.

C. VIDEO FIRE DETECTION EXPLOITING MOTION-FLICKER-BASED DYNAMIC FEATURES AND DEEP STATIC FEATURES
To improve the accuracy and efficiency of video fire detection, a method that exploits motion-flicker-based dynamic features and deep static features is proposed, as shown in Fig. 2. The proposed framework is divided into two main phases. First, region-of-interest acquisition is carried out based on the dynamic features. This process involves background subtraction and flicker detection. Background subtraction helps to obtain the suspected regions of interest, while flicker detection helps to obtain the regions of interest. In this phase, the images of interest are extracted, and the coordinates of interest in the video frames are recorded. Second, fire detection is carried out based on the static features. This phase involves extracting the deep static features of fire using the AL-CNN, which can fully extract these static features by means of inexpensive computations while avoiding the loss of image features due to fixed-size image input. The AL-CNN is used to identify whether each region of interest identified in the first phase is, in fact, a fire region; if so, an alarm is generated, and the fire coordinates in the video frame are output.

III. EXPERIMENTS AND RESULTS
A. HYPERPARAMETER SETTINGS AND DATASET DESCRIPTIONS 1) HYPERPARAMETER SETTINGS All training and testing are implemented using TensorFlow and Keras on the Windows 10 platform with an Nvidia GeForce GTX 1060 6 GB graphics card. The values of the hyperparameters are shown in Table 1.
By monitoring the value of the loss function, the learning rate was reduced by 0.9 after every 5 consecutive epochs in which the performance did not improve. In addition, the transfer learning strategy was applied during the training process. First, the AL-CNN was pretrained on the 1000 classes of the ImageNet Dataset to determine the initial weights. Then, to classify fire and nonfire regions, the number of neurons in the last layer of our network was changed from 1000 to 2.    [15], [49]- [51]. Although the existing datasets are large, the training datasets mostly consist of video frame images, leading to a large number of repeated images. Considering the features of only a single fire type will result in a feature representation that is insufficiently discriminative. By contrast, the combustion of different substances will produce fires with different representative color features, as shown in Fig. 3. Therefore, to improve the robustness of the neural network model, the fire training dataset was refined by adding different categories of fire images. The final training dataset included 22586 images, of which 9332 were fire images and 13254 were nonfire images. A detailed description of the training and testing datasets is shown in Table 2. Note that the images used during training and testing do not overlap.

B. EVALUATION METRICS
To quantitatively evaluate the performance of our proposed method and compare it with the results of other researchers,  the false positive rate (also referred to as the false alarm rate) (equation (6)), false negative rate (equation (7)) and accuracy (equation (8)) are used as evaluation metrics in this paper [52]. The goals in this paper are to achieve a high accuracy, a low false positive rate and a low false negative rate. In addition, the run time necessary for detection is evaluated in terms of the frame rate (fps), which is the average number of video frames that can be processed per second. To prove the performance of the dynamic feature extraction method, several videos in the test dataset were used for experiments. Fig. 4 (a) shows the original video frames, including images of a fire and other moving objects. Fig. 4 (b) shows the suspected regions of interest obtained after background subtraction, which include many sources of interference, such as pedestrians, car lights, and sunlight. Fig. 4 (c) shows the resulting regions of interest obtained after background subtraction and flicker detection. As shown, most of the firelike sources of interference are eliminated in these results. However, this extraction method is based on hand-crafted features, and some items may be missed during detection in complex video scenarios, such as the moving fog in the last row of Fig. 4 (c). These sources of interference are avoided through the use of the deep neural network in the next step.
The run time needed for the acquisition of dynamic features is also considered in this paper. Background subtraction can eliminate static frames and avoid the need for timeconsuming flicker detection over an entire image. In Table 3, we compare the run times for dynamic feature acquisition using our method (background subtraction and flicker detection) and using only flicker detection without background subtraction. Our method achieves a frame rate of 67 fps, whereas flicker detection without background subtraction has a frame rate of 43 fps, thus proving that the strategy for dynamic feature acquisition presented in this paper can result in a faster run time.

2) RESULTS OF DEEP STATIC FEATURE EXTRACTION
To verify the performance of static feature extraction based on the AL-CNN, a small image dataset (DS1) was used to test the model in a separate experiment. In addition, several excellent existing lightweight networks were selected for comparison, namely, SqueezeNet, ShuffleNet, ShuffleNetV2, MobileNet, and MobileNetV2 [53]- [56], [46]. The results are shown in Table 4. Compared with the other methods, our method achieves a false positive rate that is lower by 0.94-9.21%, a false negative rate that is lower by 2.6-6.73%, and an accuracy rate that is higher by 1.83-7.48%. In addition, the average time needed to process an image using our method is 0.014 s. Nevertheless, although the overall performance of our method is better than that of the existing lightweight network methods, its accuracy is still limited. The false positive and false negative rates are still undesirably high, reaching 14.15% and 6.72%, respectively. Similarly, although the accuracy is also improved, it is only 89.78%.
The test results were further analyzed to identify the specific shortcomings of our method. Example of correct detection results are shown in Fig. 5 (a). Examples of misclassification are shown in Fig. 5 (b); these cases mostly correspond to small fires at long distances or with occlusions. Fig. 5 (c) shows examples of erroneous detection, which is VOLUME 8, 2020  mostly caused by fire-like sunlight or other light sources. The region-of-interest acquisition process can be used to identify small moving objects and recognize whether such moving objects exhibit flicker features, which helps our method to avoid misclassification and erroneous detection.  [6]. Examples of images extracted from DS2 are shown in Fig. 6. Videos 1-3 show small fires from a long distance, videos 4 and 5 contain fire-like objects, and videos 6-8 contain forest fires, thus constituting a good test of our method's robustness. In addition, the dataset contains a large number of nonfire interference videos. Videos 9 and 10 contain red fire-like objects, videos 11 and 12 contain sunlight and fire-like objects, videos 13 and 14 show common outdoor scenes that contain interference from fire-like objects and smoke, and videos 15 and 16 contain videos recorded in the mountains with moving fog. Hence, this dataset is challenging for both color-based and motion-based fire detection methods. In addition, this dataset is often used in fire detection studies, which makes it easier to compare the method proposed in this paper with other existing methods.
We selected 8 related algorithms that yield excellent results for comparison with our method, including methods based on both deep learning and hand-crafted features for fire detection. The accuracy comparison results are shown in Table 5. Among the compared methods, the algorithms of Foggia, Lascio, and Celik have the best false negative rates, but among them, the highest false positive rate is 29.41%, and the lowest is 6.67%. These algorithms have high false positive rates, and their accuracy is not optimal. From the perspective of the false positive rate, the best result is achieved by the method of Habioğglu, i.e., 5.88%, but the corresponding false negative rate is 14.29%, which is the worst among all of the detection methods. The method proposed by Muhammad et al. has the best accuracy, but its false positive rate is 8.87%, and its false negative rate is also still somewhat high. By contrast, the false positive rate of our method is 2.33%, representing a reduction of 3.55%-38.85% relative to the other methods. The accuracy of our method is 97.94%, corresponding to an increase of 3.44%-23.74%, and the false negative rate is 0.84%. A further analysis of the experimental results showed that the false negatives correspond to frames in which the fires are about to burn out, whereas the fires can be detected effectively in the early stage. In summary, our method shows the best performance.

E. EXPERIMENTS ON VIDEO DATASET 3
We built a new dataset from complex video scenarios to further demonstrate the performance of our method. Sample frames from the videos in DS3 are shown in Fig. 7.
The example frames show that DS3 contains many challenges, such as fires viewed from far away, an occluded fire, a tunnel fire, and fires obscured by smoke and light. The nonfire videos include complex video scenarios with many types of interferences, such as artificial lights, sunlight, red objects, and bad weather. A detailed explanation of each video in DS3 is given in Table 6.
In addition to our proposed method, four commonly used deep neural networks were selected for use in place of the network proposed in this paper for comparison. Among them, ShuffleNetV2 [55] and MobileNetV2 [46] are excellent lightweight neural networks that have been recently proposed, and VGG16 [58] and ResNet50 [59] are ordinary CNNs with many applications. In Table 7, we compare five models in terms of run time and accuracy. Clearly, our proposed method is superior to ShuffleNetV2 and MobileNetV2 based on its false positive rate, false negative rate, and accuracy. The false positive rate is lower by 1.41%-2.30. The false negative rate is lower by 0.78%-1.5%. The accuracy is higher by 1.45%-1.69%. However, the frame  rate is somewhat lower than that of MobileNetV2, which is far higher than the general video frame rate requirements. The resulting accuracy is not very different from that of VGG16 or ResNet50, but our method achieves a shorter run time than these methods do. Thus, our proposed method can better balance accuracy and run time for fire detection.

A. COMPARISON OF FIRE DETECTION RESULTS WITH RECENT RESEARCH
We tested our method with three different device configurations: an Intel Core i7-8750H CPU with 8 GB of RAM, and an Nvidia GeForce GTX 1060 with 6 GB of onboard memory, an Intel(R) Core(TM) i7-4810MQ CPU with 8 GB of RAM, and an Intel(R) Core(TM) i7-5500U CPU with 8 GB of RAM. We compared our method with the most advanced methods reported to date in terms of the frame rate, accuracy, and false positive rate on DS2. Our method can balance time and accuracy better than the other methods, as shown in Table 8

B. APPLICABILITY OF OUR METHOD IN SPECIAL SCENARIOS
The ultimate goal of fire detection is to increase the accuracy while reducing the rates of false positives and false negatives. However, the situations depicted in real videos are complex, including interfering factors such as fire viewed from far away, artificial lights, red objects and other moving objects, as shown in Fig. 8. A high false positive rate still occurs when deep learning models alone are used for fire detection. Our method of exploiting both motion-flicker-based dynamic features and deep static features can solve these problems. First, region-of-interest acquisition is performed based on dynamic features. Second, fire detection is performed based on static features.
In the first phase, the acquisition of regions of interest based on dynamic features can enable the extraction of fires or other moving objects viewed from a long distance, as shown in Fig. 8 (a), enabling us to focus on the spectral and textural features of such fire regions. Furthermore, the introduction of flicker detection can eliminate some fire-like interferences from among the candidate moving objects, such as artificial lights and red objects, as shown in Fig. 8 (b). However, some items may be still missed during VOLUME 8, 2020  this first phase of detection in complex video scenarios, for example, moving fog or a car.
In the second phase, static features are used to further identify fire, which can eliminate these interferences, as shown in Fig. 8 (c) and (d). To improve the robustness of the proposed AL-CNN, different categories of fire images were used in the training process, thereby improving the accuracy and reducing the false positive rate for fire detection.

C. ANALYSIS OF DYNAMIC FEATURE EXTRACTION
During dynamic feature extraction, the recall should be as high as possible to ensure that the AL-CNN model can subsequently be effectively applied for fire detection. Notably, the recall during dynamic feature extraction depends on the value chosen for λ (equation (5)). To investigate the impact of different thresholds on the recall, multiple types of fire videos, including videos of fires from the burning of solid fuel, liquid fuel, and gaseous fuel, were used in experiments to determine the optimal threshold. The receiver operating characteristic (ROC) curve for region-of-interest acquisition with different thresholds for the extracted dynamic features are shown in Fig. 9. The preliminary threshold range is set to 0-1, and the step size is 0.05. It can be seen from the figure that the true positive rate remains the same when the value of λ is in the range of 0-0.5 and decreases when the value of λ is increasing in the range of 0.5-1. By contrast, the false positive rate continuously decreases when the value of λ is increasing in the range of 0-1. To obtain a high true positive rate while balancing the true positive rate and false positive rate, the value of λ is set to 0.5.
To further prove the effectiveness of the dynamic feature extraction procedure, we performed 4 experiments on DS2 and DS3 without considering dynamic features (with only the AL-CNN) and with the consideration of dynamic features (with background subtraction, flicker detection and the AL-CNN). The results are shown in Table 9. On both datasets, better results are achieved by considering dynamic features. Without dynamic feature extraction, the false positive rate is increased by 9.1%-11.85%, the false negative rate is increased by 6.69%-8.79%, and the accuracy is reduced by 8.67%-10.62%. As seen from the above analysis, considering both the dynamic and static features of fire can effectively improve the accuracy of fire detection and reduce the rates of false positives and false negatives.

D. EFFECTIVENESS OF OUR METHOD
Early in the development of a fire is the best time to extinguish it, and the time elapsed between the ignition of a fire and its detection is an important factor to consider when evaluating the ability to achieve early fire detection. Therefore, five fire videos were divided into five stages in accordance with the changes in the fire characteristics during the burning process, i.e., ignition, development, fierce burning, decay, and extinction, as shown in Fig. 10. These five fire videos represent different scenarios, including indoor and outdoor scenes.
In addition to the commonly used evaluation criteria of accuracy and run time, the time from ignition to detection was recorded. In addition, we recorded the detection accuracy during the fire evolution process for each video separately, as shown in Fig. 11. The amounts of elapsed time until fire detection is achieved for the 5 videos are 2.5 s, 2 s, 1 s, 1.5 s, and 2 s. The average accuracies are 97.97%, 98.53%, 98.3%, 97.80%, and 98.11%. For all videos, it is possible to detect the fire within 3 s with high accuracy.
To analyze the detection accuracy of the method for different fire combustion stages, we analyzed the fire detection accuracy in each of the five stages; the results are shown in Fig. 12. The average accuracy across all five videos is 97.25%, 99.93%, 100%, 98.93%, and 94.63% for the first through the fifth stages, respectively. For the ignition stage, the accuracy is 97.25%, which can meet the needs of early fire detection. For the development, fierce burning, and decay stages, the average accuracy results are 99.93%, 100%, and 98.93%, respectively, which are also suitable for achieving accurate fire detection. During the extinction stage, the accuracy is 94.63%. This analysis reveals that the method proposed in this paper can achieve more accurate detection in the early stages of a fire.

V. CONCLUSION AND FUTURE WORK
In recent years, with the development of computer vision technology, deep learning has been applied for fire detection by many researchers. Although such applications are feasible under certain conditions, their efficiency needs to be improved, and complex video scenarios must be considered. Motivated by these considerations, an efficient fire detection method is proposed in this paper. Our method offers several advantages over other recent fire detection methods. First, both motion and flicker features are considered, which enables us to more effectively extract dynamic features. In addition, our method relies on an adaptive lightweight neural network, which can effectively extract deep static features with a low computational cost. Finally, experimental results prove that our method achieves state-of-the-art performance in terms of its accuracy and false alarm rate and that our method is applicable to complex video scenarios. In general, our proposed method has better application prospects than other state-of-the-art methods and is suitable for use in public safety management systems.
In this study, we concentrated solely on fire detection. In future work, we will conduct in-depth research on fire spread prediction and spatial positioning based on existing research. We hope that our research can support the intelligent suppression of fire in its early stages and provide improved fire detection and fire suppression methods for fire management in the area of public safety.