An Attention Enhanced Bidirectional LSTM for Early Forest Fire Smoke Recognition

Detecting forest fire smoke during the initial stages is vital for preventing forest fire events. Recent studies have shown that exploring spatial and temporal features of the image sequence is important for this task. Nevertheless, since the long distance wildfire smoke usually move slowly and lacks salient features, accurate smoke detection is still a challenging task. In this paper, we propose a novel Attention Enhanced Bidirectional Long Short-Term Memory Network (ABi-LSTM) for video based forest fire smoke recognition. The proposed ABi-LSTM consists of the spatial features extraction network, the Bidirectional Long Short-Term Memory Network(LSTM), and the temporal attention subnetwork, which can not only capture discriminative spatiotemporal features from image patch sequences but also pay different levels of attention to different patches. Experiments show that out ABi-LSTM is capable of achieving best accuracy and less false alarms on different types of scenarios. The ABi-LSTM model achieve a highly accuracy of 97.8%, and there is 4.4% improvement over the image-based deep learning model.


I. INTRODUCTION
An efficient and stable vision-based smoke detection algorithm is critical for the initial forest fire detection.On one hand, forest fires present a significant challenge to human life and natural ecological environment.If a forest fire cannot be promptly extinguished, it will have a bad impact on a wide area.Reaction time is one of the key factors that determine the success of forest fire suppression.On the other hand, there were extensive research on photoelectric-or ionization-based fire smoke detectors.However, these sensors are limited by the fact that these always serve as point sensors in space, which are unsuitable at monitoring larger areas such as early forest fire detection.The limitations of current smoke sensors have prompted researches on vision-based smoke detection methods.
Pan-tilt-zoom (PTZ) IP cameras are excellent for viewing large areas.They can be placed in auto-patrol modes where they automatically step through predetermined The associate editor coordinating the review of this manuscript and approving it for publication was Xian Sun.
positions.This paper proposes a novel methods to detect forest fire using PTZ IP cameras.Figure 1 illustrates the pantilt-zoom (PTZ) long range camera for forest fire detection and a snapshot of a typical forest fire smoke at the initial stages captured by a forest watch tower.The main manifestation of early forest fires is smoke because of tree shelter and terrain.Therefore, forest fire monitoring system always focus on smoke identification.
A considerable volume of research effort within the last decade focused mainly on the identification of specific features of smoke.Existing methods of smoke detection can be divided into two categories: image-based smoke detection [1]- [3] and video-based smoke detection [4], [5].The general smoke detection algorithms usually combine motion detection, feature extraction and classification method.Imagebased smoke detection methods are usually independent of inter-frame context information.Video-based methods usually not only analyze spatial features in single frame images, but also extract temporal features between frames.
Under certain conditions, single-frame-based detection method is a good choice when it is difficult to obtain stable and reliable image sequence.Tian et al. [1] recently proposed to separate a frame into quasi-smoke and quasi-background components by convex optimization.Deep learning with convolutional neural networks (CNNs) has achieved great success in image classification and target detection.In [2], researchers proposed a deep normalization and convolutional neural network (DNCNN) with 14 layers to implement automatic feature extraction and classification.Yuan et al. [3] proposed a smoke detection method that combines local binary pattern (LBP) like features, kernel principal component analysis (KPCA), and Gaussian process regression (GPR).
However, dynamic feature is one of the essential features in forest fire smoke recognition task.The human vision system is incredibly good at recognizing complex moving smoke in sequence image, because it analyzes dynamic characteristics when judging.If dynamic features can be extracted and modeled better, it would be helpful for improving the recognition accuracy.Dimitropoulos et al. [4] introduced a higher order linear dynamical system (h-LDS) descriptor for multidimensional dynamic texture analysis.There are also researchers applying deep learning to forest fireworks identification.Lin et al. [5] proposed a joint detection framework based on faster RCNN and 3D CNN.However, the application of this algorithm is restricted by the large computational complexity in practice.
Because the moving speed and direction of smoke in the image are related to the monitoring distance and weather, it is necessary for the model to adapt to a variety of scenes.The difficulties of accurate forest fire smoke recognition lie in two aspects, (1) learning efficient spatiotemporal representation of fire smoke; (2) early forest fire smoke has different motion saliency in different frames, so the model should pay different attention to each frame.
Given the aforementioned concerns, we propose our novel attention enhanced bidirectional LSTM Network (ABi-LSTM) for forest fire smoke recognition.The foreground detection algorithm is used to extract candidate image patch sequences from video.And the block-based detection scheme is used to expand the recognition scope (the background information around the motion pixels can be obtained effectively) and roughly locate the smoke fire area.This paper focuses on the candidate image patch sequences classification.The contributions of this paper are summarized as follows: • We propose a novel attention enhanced bidirectional LSTM network (ABi-LSTM) to tackle the early forest fire smoke recognition problem.
• We consider spatiotemporal representation of smoke candidate patch by applying CNN and bidirectional long short-term memory network from forward and backward time direction.
• This is the first publication to apply attention mechanism for video-based forest fire smoke recognition.In our specific implementation, an attention network is designed to self-adaptively focus on discriminative frames with a soft attention mechanism that can automatically emphasize motion information in temporal domain.
• We construct more challenging forest fire smoke data sets to increase the reliability of the experiment.Experimental results demonstrate that the proposed method outperforms existing methods for forest fire smoke recognition.The rest of this paper is organized as follows.The proposed ABi-LSTM framework is described in Section 3. The first part of this section describes the spatial features extraction network, which is actually an Inception V3 network [23]; the second part briefly review the Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) and build a multi-layer bidirectional LSTM model by feed spatial feature of single patch to extract temporal features from forward and backward order; the third part proposes an attention network to optimize classification process with a soft attention mechanism.In Section 4, experimental results are presented, and the ABi-LSTM framework is compared with other smoke recognition algorithms.Finally, conclusions are drawn in Section 5.

II. RELATED WORK
Although there is little literature on early forest fire smoke detection, there is substantial literature on video based smoke detection and fire detection [6], [7].Many researchers have attempted to address the problem of smoke detection focusing mainly on the recognition of spatiotemporal features of smoke.As mentioned in the previous section, existing methods of smoke detection can be divided into two categories: image-based smoke method and video-based smoke method.

A. IMAGE-BASED METHOD
From the point of view of image-based method, there is a vast literature about the investigations of the static characteristics of smoke.Inspired by the airlight-albedo ambiguity model, a novel approach to detect smoke using transmission is proposed in [8].In order to improve the performance, Yuan [9] proposed a double mapping framework which concatenates histograms of edge orientation, edge magnitude and Local Binary Pattern (LBP) bit, and densities of edge magnitude, LBP bit, color intensity and saturation.Tian et al. [1], [10], [11] formulated the smoke separation problem as convex optimization that solves a sparse representation problem.In [10], three different models that constrain the smoke component are proposed to separate the smoke component from a given frame.In [11], the sparse coefficients associated with an over-complete dictionary representation is used to detect smoke as a new feature.Furthermore, Tian et al. [1] solved the sparse representation problem using dual dictionaries for the smoke and background components, respectively, and developed a method based on the concept of image matting to separate the smoke and background components from a single image frame.
However, important dynamic information is often lost in a single frame image, which is one of the main reasons for the difficulty of image-based method.

B. VIDEO-BASED METHOD
As one of essential features, the motion information of smoke will undoubtedly improve the smoke recognition accuracy in theory.In [12], a smoke detection method using color, motion and growth properties are proposed.Dimitropoulos et al. [4] introduced a higher order linear dynamical system (h-LDS) descriptor to analyze the smoke candidate image patches in each subsequence.Undoubtedly, the extraction of dynamic information in the process of recognition improves the performance of mode to some extent.
The above methods usually focus on the boundary of smoke or the effects of smoke on the edges of objects covered by smoke by hand crafted features.Traditional hand crafted feature based smoke detection methods can achieve high accuracy in a small amount of samples but generalization performances are less than satisfactory due to sensitivity to the parameter setting of the detection algorithm.Moreover, hand crafted feature based methods usually recognize smoke from small size blocks (often < 50 × 50), which limits the accuracy of smoke recognition.

C. DEEP LEARNING METHOD
In recent years, Deep Learning approaches (e.g.Convolutional Neural Networks and Recurrent Neural Networks) has led to very good performance on a variety of problems, such as visual recognition [13], speech recognition [14] and natural language processing [15].Yin et al. [2] proposed a deep normalization and convolutional neural network (DNCNN) with batch normalization to extract features for smoke detection.In [16], researchers demonstrated the effectiveness of saliency detection method and CNN in localization and recognition of wildfire in aerial images.Liu et al. [17] proposed a dual convolution network using dark channel prior (DarkC-DCN) to further improve the recognition accuracy of image-based CNN model.To ease the limitations of smoke image samples, an end-to-end trainable framework based on fast detector SSD and MSCNN for smoke detection is proposed, which can optimize the model from synthetic and real smoke samples.
Moreover, there is also video-based method using deep learning [5].A joint detection framework based on faster RCNN [19] and 3D CNN [20] is proposed to detection smoke, in which an improved faster RCNN with non-maximum annexation is responsible for the smoke target location and 3D CNN is responsible for smoke recognition by combining dynamic spatial-temporal information.Although this videobased method takes into account the dynamic characteristics between different frames, it can hardly be used in practical scenarios because of the high computational cost.
Besides CNN, Recurrent Neural Network (RNN) is another important structure of deep learning, which has made significant breakthroughs in various tasks, especially sequence processing [21].However, the vanishing gradient problem is a difficulty found in training recurrent neural network with Back-Propagation Through Time.Long Short Term Memory (LSTM) is specifically designed to tackle this problems [22], [24].There have been some meaningful works about RNN and LSTM [32]- [34].Attention mechanism is another most influential ideas in the Deep Learning community, which is used in various problems like neural machine translation, human action recognition and so on [25], [26].The attention mechanism can focus on discriminative features in a longer sequence, which can be used in many difficult tasks.
Our key motivation of ABi-LSTM is that: a) compared with the hand crafted feature, CNN has more powerful feature extraction ability, and the block size used for forest fire smoke recognition in this paper is larger, which is helpful for CNN

III. APPROACH
In this section, we propose a novel Attention Enhanced Bidirectional LSTM architecture for forest fire smoke recognition.

A. OVERVIEW OF METHODOLOGY
As illustrated in Figure 2, the proposed ABi-LSTM is mainly composed of three components: the spatial features extraction network, the Bidirectional LSTM network, and the temporal attention subnetwork.The spatial features extraction network is employed to extract spatial features from candidate patches, which are captured by ViBe [32] background subtraction method.The Bidirectional LSTM network learns long-term smoke-related information from spatial features.In order to make full use of both the past and future context information of a sequence in classification, a bidirectional LSTM is employed to extract temporal features from forward and backward order.In this model, the orange arrows indicate the direction of information flow in forward LSTM and the blue arrows indicate the direction of information flow in backward LSTM.In order to concentrate on discriminative frames which contribute more on forest fire smoke recognition, an attention subnetwork is designed to automatically emphasize motion information with a soft attention mechanism in temporal domain.We'll provide a detailed explanation of each component later.

B. SPATIAL FEATURES EXTRACTION
CNNs have achieved excellent performance in computer vision tasks.The Inception network was an important milestone in the development of CNN classifiers.GoogLenet is known as Inception V1 [27], and the researchers have subsequently proposed improved models such as Inception V2 [28] and Inception V3 [23].In this paper, instead of building a model from scratch, a pretrained Inception V3 model is used to capture spatial information from each individual frame.
Inception V3 is a heavily engineered network, which used a lot of upgrades to increase the accuracy and reduce the computational complexity: (1) Factorize 5 × 5 convolution to two 3x3 convolution operations to improve computational speed.( 2 In this study, the output of the ''avg_pool'' layer of Inception V3 is used as spatial feature instead of the fullyconnected layer.The 2048-dimensional image features at each time-step will form spatial features sequence that are learned by subsequent bidirectional LSTM.

C. BIDIRECTIONAL LSTM
In this section, we briefly review the Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) to make the paper self-contained.RNN is an extension of feed-forward neural networks and has yielded promising results in sequence learning.Figure 3-a demonstrates an RNN neuron.The input of the RNN is a sequence data {x 1 , x 2 , . . ., x T }.
As shown in Figure 3-a, the hidden state of all RNN units at the tth time step is determined by the current input X t and the previous hidden state h t−1 at the (t − 1)th time step.
where σ is a nonlinear activation function, g denotes the operation of the fully-connected layer, b h and b o are bias vectors, W xh , W hh and W ho denote weight matrices from the current input layer to hidden layer, the previous hidden layer to current hidden layer and the current hidden layer to output layer, respectively.RNN is an important model for sequential date modeling of the deep learning family.However, it comes with some challenges in modelling long-term dependencies such as vanishing and exploding gradient problems during the training phase.Our model builds on LSTM cells, which is an advanced RNN architecture explicitly designed for tackling this problem.Our key motivation of chosen LSTM is that it can learn long-term dependencies and avoid exploding and vanishing gradient problems that traditional RNN suffers from during back propagation optimization.LSTM has been successfully applied to handwriting recognition, machine translation and so on.The difference between LSTM and RNN is that the later adds several gates to the cell to judge whether the information is useful or not [39].As illustrated in Figure 3-b, a LSTM neuron updates its memory cell state C t from different sources at given time step t: the current input X t , the hidden state from LSTM themselves at the last time step h t−1 as well as previous memory cell state C t−1 .
At each time step, the LSTM neuron can choose to input, forget, and output the memory cell state governed by four important parts: input gate i t , output gate o t ,forget gate f t and candidate cell state Ct .Based on these parts, LSTM neuron memory cell state and output can be computed by: where tanh presents hyperbolic tangent function, '•' is a matrix multiplication operator, ' ' denotes the products with a gate value, and b i , b f , b o and b C are bias vectors.
The weight matrix subscripts have obvious meaning.For example, W hi , W xo and W ho denote hidden-input gate matrix, input-output gate matrix and hidden-output gate matrix, respectively.In the proposed ABi-LSTM, multi layers LSTM are stacked to learn long term dependencies in sequence data.
In order to make full use of both the past and future context information of a sequence in classification, we build a bidirectional LSTM model by feed spatial feature of single patch to extract temporal features from forward and backward order.The bidirectional LSTM model consists of two parts: forward LSTM and backward LSTM as illustrated in Figure 4.The forward LSTM updates its memory cell state C t , starting at time t = 1 (from x 1 to x T ).Similarly, the backward LSTM updates its memory cell state ← C t , starting at time t = T (from x T to x 1 ).Formally, the bidirectional LSTM model works as follows, for raw image patch I t , forward memory cell state C t and backward memory cell state ← C t , the encoding performs as where C, T , ← T represent CNN, forward LSTM and backward LSTM respectively and C , T and ← T are their corresponding weights.X t is the spatial feature of a single frame extracted by CNN.M presents multi-layer LSTM and M is multi-layer LSTM weights.

D. ATTENTION MECHANISM
For a long image patch sequence, the amount of valuable information provided by different frames is in general not equal.We employ an attention network to adaptively focus  on discriminative frames with a soft attention mechanism that can automatically measure the importance of different frames.
As mentioned previously, for consecutive T frames, the multi-layer bidirectional LSTM learns spatiotemporal information and outputs fire smoke related representation O = {O 1 , O 2 , . . ., O T }.The illustration of the spatial attention network is shown in Figure 5.At each time step t, the scores s t for indicating the importance of the T frames are jointly obtained as (11) where U s , W xs , W os are the weight matrices learned from the network and b s , b us are bias vectors.X t is the spatial features extracted by CNN.O t is the spatiotemporal information extracted by Bi-LSTM.For the kth frame, the importance value is computed as which is a normalization of the scores.Among the sequences, the larger the score, the more important this frame is for determining the type of classes.We regard importance values as attention weights.Instead of assigning equal degrees of importance to all the spatiotemporal information O t , the final output of the attention network is modulated to Finally, we concatenate all the time step output of attention network and add a softmax layer on top of the model for classification.

IV. EXPERIMENTS
In this section, we will introduce the experimental setting in detail.Then we design several groups of experiments to measure the performance of proposed ABi-LSTM.Finally, we test the computational efficiency of the proposed framework.

A. DATASET
There is currently no large scale forest fire smoke dataset for algorithmic train and test.We build a large-scale forest fire smoke video dataset with Nanjing Enbo Technology Company Ltd.We collect a large number of real early forest fire video to create our dataset, all videos were captured from forest fire monitoring system with an image size of 1920 × 1080.
Considering that dynamic feature is one of the essential features of smoke.In this paper, the foreground detection algorithm is used for the candidate patch proposal.After comparing the performance and stability of some foreground detection algorithms, the ViBe [32] background subtraction method is selected to detect the candidate patch.When the number of individual foreground target pixels exceeds a threshold (50 in this paper), the area in which the foreground target is located is considered to be a suspected target.The 299 × 299 image sequence centered on the moving target is fed to ABi-LSTM.The top half of the Figure 6 is the raw video sequence, and the bottom half is the foreground map obtained by VIBE.
The sequence sample is 5 frames per second, with a total length of 20 frames.The total number of sequences is 2000, including 1000 smoke containing sequences and 1000 nonsmoke sequences.For purpose of training and testing, the  dataset is split into training and test sets, with an 80-20 split.The details of dataset are described in Table 1.

B. IMPLEMENTATION DETAILS
Experiments were conducted on a personal computer with CPU of Intel Core i5-6500 and GPU of NVIDIA GTX1080.The proposed ABi-LSTM architecture is implemented on the TensorFlow framework.
In most of the literature, researchers normally computed accuracy based at patch-level because there is little test data.However, we evaluate the accuracy based on suquence-level evaluation that is smoke and non-smoke sequence classification accuracy in our work.The proposed ABi-LSTM framework is trained stage by stage. In

C. RESULTS AND COMPARISONS
In this section, we first introduce the evaluation protocol including statistical measures.Then we evaluate the performance our ABi-LSTM method with other methods.Thirdly, we do the ablation experiments of each sub-model of the proposed ABi-LSTM.• False Negative (FN): Incorrectly classified as the nonsmoke sequence Performance of binary classifier are usually evaluated by the following widely used statistical measures: true positive rate (TPR), true negative rate (TNR) and Accuracy Rate (AR).The relative number of TP with respect to the overall number of positives is called the true positive rate (TPR), which is also known as sensitivity.The true negative rate (TNR) measures the proportion of actual negatives that are correctly identified as such.Another, Accuracy Rate (AR) is an overall measure for the relative number of correct classifications of both positives and negatives, which can be used to compare the overall performance of the different algorithms.Mathematically, these statistical measures can be expressed as: In our ABi-LSTM model, the cross entropy loss function layer is the end with two parts: the predicted probability value q i and the true label p i .For each sequence x, the probability of the output y = 1 is given by q y=1 = ŷ, Similarly, the probability of the output y = 0 is simply given by  q y=0 = 1 − ŷ.The true probabilities can be expressed similarly as p y=1 = y and p y=0 = 1 − y.The loss function for the example is formulated as: 2) EXPERIMENTS RESULTS We first used the test sets to find the optimal setting of our approach: learning rate, number of LSTM layers, number of LSTM hidden units, and so on.
To show the superiority of the proposed ABi-LSTM, we compare our method with Inception V3 [23], CNN+MLP, 3DCNN [20], TSN [35], ECO [36]and three common smoke and fire recognition methods [29], [30], [31].Table 2 shows the comparison results of different methods and parameters on our dataset.In order to analyze the influence of parameters on the accuracy and complexity of the model, we compared the experimental results of CNN+MLP and ABi-LSTM under different parameters.The x in CNN-MLP-x indicates the number of hidden cells in the MLP.Similarly, ABi-LSTM-x indicates the number of bidirectional LSTM cells.The input to the other models is chronological patch sequences, except that the input to the Inception V3 is single patches.And we report the average runtime required to process 20 frame sequences in Table 2.
As shown in Table 2, the ABi-LSTM framework achieves the total accuracy of 97.8% with true positive rate 97.5% and true negative rate 98.0%.From Table 2, we can see that the results of our proposed ABi-LSTM outperform 4.4% than image-based Inception V3 model.The comparison results prove that the ABi-LSTM is optimal for sequence-based forest fire smoke recognition.
For clarity, the confusion matrix of ABi-LSTM is shown in Table 3.Furthermore, we conduct an ablation study to evaluate the performance of each sub-model of the proposed ABi-LSTM.In this research, we conduct three models for comparison: • Inception V3 is a single frame image model, and its experimental results are mentioned in Table 2, which is considered as baseline in ablation experiments.
• Uni-directional LSTM-x is a single-direction LSTM, in which x represents the number of hidden units.Unidirectional LSTM consists of two sub-model: the spatial features extraction network and the uni-directional LSTM network.
• Bi-LSTM consists of two sub-model: the spatial features extraction network and the Bidirectional LSTM network.The input patches are fed into the Bi-LSTM network one by one.
• ABi-LSTM consists of all the three sub-model: the spatial features extraction network, the Bidirectional LSTM network, and the temporal attention subnetwork.The input patches are fed into the ABi-LSTM network one by one.As shown in the Table 4, the Bi-LSTM network improves the accuracy of the Inception V3 imaged-based model by 2.1%, and the temporal attention subnetwork improves the accuracy of the Bi-LSTM model by 2.3%.The ablation experiments justify our initial design idea.The proposed model can be deployed easily, which can be used to recognize the suspected smoke patches in practical application.Figure 9 shows the recognition results of proposed model.

V. CONCLUSION
In this paper, we propose an attention enhanced bidirectional LSTM network (ABi-LSTM) for early forest smoke recognition.Specifically, the proposed approach can be summarized as three parts: a) an Inception V3 network which is used to extract spatial features from smoke candidate patch step by step; b) Bi-LSTM model which is designed to extract temporal features from forward and backward order by feed spatial feature of single patch; c) attention network is employed to optimize classification process with a soft attention mechanism that can automatically measure the importance of different frames.Extensive experiment results show that the proposed ABi-LSTM framework obtains higher accuracy in early forest fire smoke recognition compared with other methods.Moreover, ablation study is conducted to evaluate the performance of each sub-model in ABi-LSTM.
The proposed ABi-LSTM has been inspired by the attention mechanism in neural machine translation, which can adaptively focus on discriminative frames.As a result, this framework may be suitable for early forest fire smoke detection.An interesting question is whether attention mechanism can be used in a single frame image to enable the model to learn more discriminatory spatial information.This will be investigated in the future.

FIGURE 1 .
FIGURE 1.A pan-tilt-zoom (PTZ) long range camera for forest fire detection and a snapshot of a typical forest fire smoke at the initial stage captured by a forest watch tower.

FIGURE 2 .
FIGURE 2. Framework of the proposed ABi-LSTM for forest fire smoke recognition, which consists of the spatial features extraction network, the bidirectional LSTM network, and the temporal attention subnetwork.The input images are fed into the ABi-LSTM network one by one.
) Factorize n × n convolution to a combination of 1 × n and n × 1 convolutions.(3) Expand the filter bank outputs to remove the representational bottleneck.(4) Combination of additional regularization with batch-normalized auxiliary classifiers and label-smoothing.

FIGURE 4 .
FIGURE 4. A single layer bidirectional LSTM.We feed spatial features in both forward (red arrows) and backward (blue arrows) order which allows our model learns both the past and future context information context information from both left and right side over time.

FIGURE 5 .
FIGURE 5.The graphical illustration of the attention model.

FIGURE 6 .
FIGURE 6.Using background subtraction technique to get the moving targeted region.Positive sequences are highlighted in red boxes, negative sequences are highlighted in yellow boxes.
the first stage, we use Adam optimizer for Inception V3 network training.Instead of randomly initializing the weights, we use the pre-trained Inception V3 model on Ima-geNet to finetune, with learning rate of 0.00001, batch size of 32, input size of 3 × 299 × 299, and train epoch of 30.Since our forest fire smoke recognition task is different from the ImageNet, we define a new top-level classifier on the basis of Inception V3 neural network by adding a fully connected layer.The newly stacked fully connected layer uses relu as the activation function and uses softmax for classification.In training phase, we chose to train only the top 2 inception blocks and newly stacked layer, and freeze the other 172 layers.In the second stage, the output of the ''avg_pool'' layer in Inception V3 are extracted as spatial feature for each frame.The learned spatial feature and sequence label are fed to train the subsequent model.We use RMSprop optimizer for ABi-LSTM network training, with learning rate of 0.00001, batch size of 32, input size of 2048 × 20, and train epoch of 50.
1) EVALUATION PROTOCOLFor binary classification of image patch sequence, the sequence can be divided into true positive (TP), false positive (FP), true negative (TN), false negative (FN) four groups based on its combination of true class and predicted class.The predicted class is the output of the ABi-SLTM.The specific classification is as follows:• True Positive (TP): Correctly classified as the smoke sequence• True Negative (TN): Correctly classified as the nonsmoke sequence• False Positive (FP): Incorrectly classified as the smoke sequence

FIGURE 7 .
FIGURE 7. Smoke sequences used in our method.

FIGURE 8 .
FIGURE 8. Non-smoke sequences used in our method.

FIGURE 9 .TABLE 3 .
FIGURE 9. Results of the forest fire smoke monitoring system.The detected smoke was marked in red boxes.TABLE 3. Confusion matrix for the classification of smoke and non-smoke based on the ABi-LSTM-64.

TABLE 2 .
Comparison with other method on our dataset.

TABLE 4 .
The ablation analysis of the ABi-LSTM.

TABLE 5 .
Average complexity comparisons of an image.