Deep Learning-Based Approach for Civil Aircraft Hazard Identification and Prediction

Safety is an eternal issue in the civil aviation transportation. Once a civil aviation accident occurs, it will cause great casualties and economic losses. In order to ensure the civil aviation safety, the hazard identification and prediction of civil aircraft should be effectively and accurately realized. The civil aircraft uses Aircraft Communications Addressing and Reporting System (ACARS) to interact with the ground during flight. The data generated by ACARS has a simple structure and strong timeliness. In view of the advantages of ACARS data, a hazard identification and prediction method based on support vector machine optimized by particle swarm optimization (PSO-SVM) and long short term memory (LSTM) neural networks which uses ACARS report as analysis data is proposed. First, in order to reduce the identification and prediction time cost, the SVM-based recursive feature elimination method with cross-validation algorithm (SVM-RFECV) is used to select the characteristic parameters. Then, the SVM optimized by PSO is used to identify hazard based on the selected parameters. According to the identification results, the LSTM is used to predict the trend of the selected parameters to realize hazard prediction. An A13 report of APU generated by ACARS is selected as analysis data for hazard identification and predication in this paper. The analysis results show that the proposed identification method based on PSO-SVM and SVM-RFECV has high identification speed and accuracy. The proposed prediction method based on LSTM has the best prediction performance. The proposed method can effectively identify hazards and accurately predict the trend of parameters to improve the safety of aircraft.


I. INTRODUCTION
Hazard identification is the basis of aircraft risk management. Civil Aviation Administration of China (CAAC) states that the risk management process must include hazard identification in ''Requirements for the Air Operator's Safety Management System'' (AC-121/135-FS-2008-26). Federal Aviation Administration (FAA) has proposed the safety ''monitoring/data analysis'' (MSAD) process in [1]. In this process, the hazard standards for identification are established by the FAA. According to these standards, aviation safety The associate editor coordinating the review of this manuscript and approving it for publication was Rosario Pecora . organizations can automatically or manually select hazard event from event report system to achieve the purpose of hazard identification. Identifying hazard through the hazard standards is easy to implement. However, due to the lack of unified standard for the expression of event report, there will be many different expressions for the same event, which makes the actual effect of hazard standard filtering be not well. The Society of Automotive Engineers (SAE) analyzes the collected data and finally compares the analytical values with expected values to achieve the purpose of hazard identification [2]. However, due to the complexity of a hazard event, threshold monitoring with only a single parameter will result in a low accuracy of hazard identification. All in all, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ various aviation organizations have paid attention to hazard identification. But, the methods which they used are relatively simple and the identification accuracy is not high. It is also important to note that various aviation organizations have mentioned the concept of hazard trend prediction in the above documents. The International Civil Aviation Organization (ICAO) emphasizes in the safety management system (SMS) that the identification of hazards should be active and forward-looking [3]. ICAO also proposes that it is necessary to actively identify the hazards that have not occurred. However, aviation organizations have not yet provided specific implementation methods for hazard prediction. In order to improve the accuracy of hazard identification and achieve active hazard prediction, a complete hazard identification and prediction framework based on machine learning and deep learning is proposed.
In the proposed framework, the ACARS report data is used as analysis objective. ACARS is used to realize real-time data communication between the ground system and the aircraft [4]. The data of the aircraft during flight are sent to the ground in the form of report by the ACARS [5]. The ACARS data is highly targeted and the amount of the ACARS data is small. Many scholars and airlines used the quick access recorder (QAR) data as safety analysis data [6]- [8]. But, the huge amount of QAR data will make the analysis process time consuming. In this paper, the use of ACARS data can make up for the shortcomings of poor timeliness of using the QAR data as the analysis data.
Generally, the ACARS data is used for single parameter value alarms and data trend observation for airline engineers. In this case, the relevant parameters are considered separately when identifying the hazard of system. The state of the system is determined by expert experience. Besides that, the data trend observation is a qualitative prediction method. The engineers cannot give a certain value or range. The value alarms and data trend observations are easy to cause identification and prediction error when the engineers lack of expert knowledge or they are in poor physical condition.
To address these problems, we propose a novel hazard identification and prediction method by using PSO-SVM and LSTM. The proposed method takes multiple parameters as analysis object. The parameters used for analysis are selected according to their importance by using RFECV method. The PSO-SVM is used to identify the hazard by considering a combination of multiple parameters. The deep learning method LSTM is used to learn historical data to quantitatively predict the trend of the selected parameters. The proposed method reduces the dependence on expert experience in the hazard identification and prediction. In addition, the proposed method has a fast hazard identification speed and high hazard identification and prediction accuracy.
Recently, many models including artificial neural network (ANN), tree, K-nearest neighbor (KNN) have been used to identify hazards in civil aviation. These models require a large amount of input data. Besides that, these models are easy to fall into local extremum. Comparing with these models, the SVM presents prominent performance with small sample [9]. SVM is an efficient machine learning technique derived from statistical theory by Vapnik [10]. SVM has been widely used in fault diagnosis of various structures and systems of aircraft. Vieira et al. used SVM to detect anomalous behavior that could characterize the initial fault of the APU [11]. Jiang et al. used SVM to identify the abnormal state of the engine lubrication system [12]. Lou and Zhang designed an infrared thermal imaging circuit board fault diagnosis system by using LIBSVM, which can accurately and quickly complete the fault location while saving time and cost [13]. Zhang et al. used SVM to detect and estimate damage severity in a simulated multi-damage scenario [14]. The SVM has high reliability and accuracy in aircraft fault diagnosis. However, the application of SVM in civil aircraft hazard identification is rarely reported. In order to ensure the reliability and accuracy of hazard identification, a hazard identification model based on SVM is proposed in this paper.
When large amount of data are input into the SVM, the time cost of SVM will be large. In order to reduce the input data and improve the speed of identification, a SVM-based feature selection model SVM-RFE is used in this paper. While the SVM model is built, the features with the lowest weights are removed by SVM-RFE iteratively [15]. The removing sequence of the features represents the feature importance ranking [16]. SVM-RFE has been applied in many fields, such as signal processing, fault diagnosis and genomics [17]- [19]. In this paper, we use SVM-RFE to select the characteristic parameters of ACARS data. Because the RFE algorithm only ranks the importance of all parameters, it cannot determine the number of optimally selected parameters. The cross-validation (CV) algorithm is added to the RFE algorithm to calculate the identification accuracy of each iteration of the SVM. The number of optimally selected parameters is determined by the identification accuracy.
The optimization of SVM parameter c and g is critical to improve the identification accuracy of SVM. Grid-search algorithm and genetic algorithm (GA) are two common parameter optimization algorithms. However, the grid-search algorithm is time consuming [20]. The process of crossover and mutation in GA is complex [21]. Compared with these two algorithms, the PSO algorithm has excellent global search ability. The requirements for the parameters of PSO are very low and the process of PSO is easy to implement [22]. In this paper, the PSO algorithm is used to optimize the SVM to improve the identification accuracy.
With the advent of the era of big data, deep learning has been widely used in the industrial field. Many scholars have applied deep learning in the field of civil aviation and achieved remarkable success. Dong applied deep neural networks to aircraft parameter identification to detect and characterize aircraft icing conditions [23]. Omar Alkhamisi and Mehmood used the integration of machine learning algorithm and deep learning algorithm to improve risk prediction in aviation system [24]. Zhang and Mahadevan trained two different types of deep learning models to predict flight path from different angles [25]. In this paper, a hazard prediction method based on deep learning of long-short-term memory structure recurrent neural network (RNN) is proposed when the identification model judges that there is no hazard in the system. RNN has been successfully applied to various sequence prediction [26]- [28]. Unfortunately, RNN is a very deep feed forward network in which all the layers share the same weights. This makes it difficult for RNN to learn to store information for a long time [29]. LSTM is an important branch of RNN proposed by Hawke Wright and Schmidtberg which can learn long-term dependence [30]. As a kind of RNN, LSTM is particularly suitable for predicting time series data. In the aviation field, LSTM has been successfully used for residual life prediction, engine vibration prediction, and aircraft landing speed prediction [31]- [33]. However, the application of LSTM in hazard prediction is still rarely involved. In this paper, the LSTM is applied to the hazard prediction by using ACARS data. The trend change of data in a long period is fully considered by LSTM to ensure the accuracy of hazard prediction.

II. CONTRIBUTIONS
Specific contributions of this paper include: (1) A complete civil aircraft hazard identification and prediction framework based on machine learning and deep learning is proposed. Compared with traditional manual identification, the proposed framework improves the accuracy and initiative of aircraft hazard identification. (2) Different from using QAR data as the analysis data, we use ACARS report data as the analysis data, which has small data and strong system pertinence. Through ACARS, the aircraft status can be obtained in real time to realize real time hazard identification and prediction. (3) We combine PSO-SVM with SVM-RFECV to improve the accuracy and the speed of identification model. Based on the optimized identification model, fast and accurate civil aircraft hazard identification is realized. (4) In addition to considering the existing hazards, we also propose a deep learning based hazard prediction model, which can predict the potential hazards that have not yet occurred by predicting target parameters. (5) Compared with other neural network algorithms, the deep learning method of LSTM can remember and select long-term states. The proposed LSTM-based hazard prediction model predicts the target parameter value by learning the potential rules in the target parameter historical report data to improve the prediction accuracy.

III. FRAMEWORK
During the flight, the aircraft sends the ACARS data to the ground server through the ground/air network. While the real time ACARS data is sent to the database for storage, the real time ACARS data will also be used for hazard identification and prediction. First, the real time reports are input into the PSO-SVM for hazard identification. When the identification result shows that there is a hazard in the analyzed system, the hazard alarm information is sent to engineers for further risk analysis and assessment. Decisions made by airline will be fed back to the pilot through the ground/air network. When the identification result shows that there is no hazard in the analyzed system, the important parameters of the analyzed system are predicted by using LSTM. The parameter prediction results are sent back to the PSO-SVM based hazard identification model to realize hazard prediction. When the hazard prediction result shows that there is a potential hazard in the analyzed system, the hazard warning information is sent to engineers for further risk analysis and assessment.
When the hazard prediction result shows that there is no potential hazard in the analyzed system, the prediction result is input into the database for storage. Especially attention, the hazard identification and prediction models have been trained by using historical ACARS data before real time data are input into them. Besides that, the parameter selection also has been completed based on SVM-RFECV by using historical data. The framework for hazard identification and prediction by using the ACARS data is shown in Fig. 1.

IV. HAZARD IDENTIFICATION BASED ON PSO-SVM AND SVM-RFECV
A. DATA PREPROCESSING The ACARS collects and processes QAR data from various aircraft systems in real time by using the data management unit (DMU). The customized reports are compiled and generated in DMU. ACARS sends the customized reports in real time to the ground base station through the air-ground data link to achieve online monitoring of the aircraft. Compared with QAR data, ACARS report data have the following two advantages.
(1) The data acquisition is flexible. By setting the parameter acquisition conditions, the data of the same parameter in different states can be obtained. Collecting multi-stage data for hazard analysis are consistent with the complexity of hazard event. In addition, targeted data collection reduces the difficulty of data preprocessing greatly. (2) The data acquisition is real time. ACARS reports are transmitted to the ground base station in real time. ACARS reports triggered by the relevant logic can be used directly for fault diagnosis and hazard identification.
In this paper, we extract the A13 report of the APU from ACARS as analysis objective. The A13 report is the MES/IDLE report of the APU. The APU MES/idle reports are an average set of APU related parameters during the startup of each main engine and during the APU idle state. Fig. 2 is a general format of A13 report. During the flight, the ACARS will generate an A13 report with actual values in the format shown in Fig. 2. The introduction of A13 report used in this paper comes from the aircraft maintenance  manual (AMM) provided by Airbus. The A13 report consists of four parts, including header, APU history information, APU operating parameters when starting the main engine, and APU self-starting parameters.
The header is recorded from CC to CE, which includes the flight information of the aircraft, the flight phase for report generation, the total temperature and the status and opening angle of the bleed valve. The history information of APU is recorded in E1, which includes the APU serial number and the number of operating cycle hours. The values of APU operation parameter are recorded from N1 to S3 when the main engine of the aircraft is started. Among them, N1 and S1 record the value of APU operation parameter when the first main engine is started. N2 and S2 record the value of APU operation parameter when the second main engine is started. N3 and S3 record the value of APU operation parameter when the main engines are started and the APU is idle. V1 records the value of APU self-starting parameters. The detailed descriptions of parameters related to APU are shown in Table 1.
The data recorded in S1, N1, S2 and N2 in the A13 report are the corresponding data of the APU when the two main engines start. The time when the data is recorded is independent of APU and it only related to the main engine state. The record time of the data from S1 to N2 is different when different main engines and the same APU are combined. Therefore, the data from S1 to N2 cannot be used to analyze APU performance. Some parameters in the report are constant in APU idle state, such as speed, air flow, generator load, IGV, etc. In addition, ACW1 and ACW2 are control signals that cannot indicate the APU status. Hence, these parameters are manually removed to reduce the training data dimension. Finally, the parameters included in the report used for hazard identification are shown in the first row of Table 2. In order to distinguish the two OTA, the oil temperature at APU startup is marked as OT.
In this paper, the data of A13 report and APU maintenance records of four A320 planes of Airlines from 2015 to 2017 are preprocessed to generate sample data for training the identification model. The aircraft number is b22xx. The APU number is 2097. According to the maintenance time in the APU maintenance record, 10 sets of data before the corresponding maintenance time point in the A13 report are labeled as the fault data. 20 sets of data after the corresponding maintenance time point are labeled as the normal data. Finally, 1244 sets of labeled data with 10 parameters are generated as shown in Table 2. The fault data is labeled with 1 and the normal data is labeled with 0.

B. HAZARD IDENTIFICATION MODEL 1) SVM
The training data of SVM can be represented by (1).
where x i is the given input data. Each x i has p features represented as f 1 , f 2 , · · · , f p . y i is the learning target. SVM searches for the hyperplane that maximizes the distance from the hyperplane to the nearest samples in each class. The hyperplane can be written as w • φ (x i ) + b = 0, where w and b are classification model parameters. φ is a mapping to a higher dimensional space in which x i can be linearly separated. The training task for the SVM model can be formalized as optimization task min w (w 2 /2) subject to The optimization equation can be rewritten as a Lagrange function such as (2). The dual form is derived by (3). The (4) is the constraint condition of (2) and (3).
where λ i is Lagrange multiplier. The model parameter w and b can be determined by (5).
where K x i , x j is a kernel function.
In the case of determining all multiplier and model parameters, the newly input test sample x new can be classified according to which side of the hyperplane it is. The label of newly input test sample x new can be obtained by (6).

2) RESULTS AND DISCUSSION
The SVM model is compared with other identification models to further illustrate the superiority of SVM model. The data in Table 2  To further evaluate the performance of each model, Receiver Operating Characteristic (ROC) analysis method is used. The ROC analysis mainly evaluates the performance of the model by comparing the area under the ROC curve (AUC). The ROC curve is drawn on a two-dimensional plane. The abscissa of the plane is the false positive rate (FPR). The ordinate of the plane is the true positive rate (TPR). The FPR and TPR are calculated by (7) and (8).
FP is a false positive value which indicates the number of negative classes identified as positive. TN is a true negative value which indicates the number of negative classes identified as negative. TP is a true positive value which indicates VOLUME 8, 2020  The better the performance of the identification model is, the bigger the AUC value is. The ROC curve of each identification model and its corresponding AUC value are shown in Fig. 3. The SVM model has the highest AUC value of 0.92. Therefore, the SVM model with highest identification performance is selected as the hazard identification model in this paper.

C. PARAMETER SELECTION BASED ON SVM-RFECV
Excessive dimensionality of input data leads to big time cost of hazard identification. The hazard identification of aircraft requires high timeliness. So, it is necessary to reduce the dimensions of the input data to improve the speed of identification. In this paper, the data in Table 2 are input into the SVM-RFECV model to select the parameters.
SVM-RFE is a sequence backward selection algorithm based on the maximum interval principle of SVM model. SVM-RFE uses the SVM model to train the samples to rank each feature and remove the feature with the lowest score. Then the remaining features are used to train the SVM model again for the next iteration. The ranking criterion of SVM-RFE is based on the difference of the objective function for each removal of a feature. The objective function can be expressed as (9). where The difference of the function for each removal of the k-th feature is used to compare the contribution of each feature to the minimization objective function. Ranking score for the k-th feature can be calculated by (10). where The SVM-RFE model only ranks the importance of all features. It cannot determine the number of optimally selected features. Hence, the CV algorithm is added to the RFE algorithm to calculate the identification accuracy of the SVM-RFE model in this paper. The number of optimally selected parameters is determined by the identification accuracy. The pseudo-code of SVM-RFECV model is shown in Table 4.
The ranking result of the parameters and the identification accuracy corresponding to the selection number of parameters are shown in Fig. 4 and Fig. 5. Fig. 4 shows that when the number of selected parameters is four, the identification accuracy is highest. Therefore, the four parameters with the highest scores in Fig. 5 are selected. The final selected parameters are EGTA, P2A, STA and OT.    The data corresponding to the parameter EGTA, P2A, STA, OT and Label in Table 2 are selected. The selected data are shown in Table 5. The new sample data is input into the SVM model. The training time, identification speed, and identification accuracy before and after parameter selection are compared. The comparison results are shown in Table 6. VOLUME 8, 2020 It can be concluded from Table 6 that the training time of SVM model reduces from 0.48 seconds to 0.34 seconds after parameter selection. The training time reduces by 29.2%. In terms of identification speed, the identification speed increases from 9600 obs/s to 15000 obs/s. The identification speed increases by 56.25%. Overall, the training time and the identification speed have been greatly improved. However, the identification accuracy increases from 88.3% to 88.5%. The identification accuracy increases only by 0.2%. The improvement of the identification accuracy of the SVM model is not obvious. A further optimization of the SVM model is required.

D. SVM OPTIMIZATION BASED ON PSO
There are two important parameters in the SVM. The one is parameter c, which indicates the tolerance of the relative error. If the setting value of c is too large, the error will not be tolerated and the SVM model is easy to over fit. If the setting value of c is too small, the SVM model is easy to under fit. The other is parameter g, which is the parameter of kernel function. If the setting of g is too large, the classification result of the model for unknown samples will be very poor. It may lead to high training accuracy and low test accuracy. If the setting value of g is too small, the smoothing effect will be too large. A particularly high accuracy cannot be obtained on the training data. The accuracy of the testing data will be affected. The appropriate values of the selected parameter c and g can improve the identification accuracy of SVM. In order to improve identification accuracy, the PSO is used to select the c and g of SVM.
The PSO algorithm assumes that m particles form a particle swarm in an n-dimensional target search space. The ith particle is represented as an n-dimensional space vector The position of the i-th particle in the n-dimensional search space is x i k . The position of each particle is a potential solution of PSO. By substituting x i k into an objective function f (x i ), the fitness value f i k can be calculated. According to the fitness value f i k , the quality of x i k can be evaluated. The flight speed of the ith particle is an n-dimensional vector, which is recorded as The current optimal position of the i-th particle is P i k = p i 1 , p i 2 , · · · , p i n . The current optimal position of the particle swarm is P g k = p g 1 , p g 2 , · · · , p g n . At the beginning of the PSO algorithm, an initial particle swarm is randomly generated. Each particle in this particle swarm is given a random speed. The particle updates its speed and position by using (11) and (12).
where w is the non-negative inertia factor. v i k is the flight speed vector of particle i at the k-th iteration. c 1 and c 2 are learning factors. r 1 and r 2 are random numbers between [0,1]. P i k is the optimal position of particle i at the k-th iteration. x i k is the flight position vector of particle i at the k-th iteration.  P g k is the optimal position of the particle swarm after the k-th iteration.
The PSO algorithm updates itself by calculating two extremums. One is the individual extremum P best , which represents the particle optimal solution. The other is the global extremum G best , which represents the optimal solution of particle swarm. The pseudo-code of PSO algorithm is shown in Table 7.
The data in Table 5 are used as an input data of PSO-SVM. The 1124 set samples are used as a training data. 100 set samples of the selected data are randomly selected as a testing data. The SVM identification accuracy in each iteration is taken as the fitness value. The Fig. 6 is the fitness curve of PSO. The Fig. 7 is identification result of the testing data after optimization.
The PSO-SVM is compared with GA-SVM and SVM in the same training data and testing data. The comparison results are shown in Table 8. Compared with the other two methods, the PSO-SVM has the highest identification accuracy. The values of parameter c and parameter g are 63 and 53, respectively. The identification accuracy of training data and testing data are 90.99% and 93%, respectively.  It can be summarized from Table 3, Table 6 and Table 8 that the proposed PSO-SVM and SVM-RFECV methods for hazard identification has improved the identification speed and accuracy. The optimized identification speed of the hazard identification model reaches 15000obs/s. The optimized average identification accuracy of the hazard identification model reaches 90.99%.

V. HAZARD PREDICTION BASED ON LSTM A. DATA PROCESSING
When the hazard identification model finds that the APU is not in a hazard state, the parameter prediction of the report is required. The result of parameter prediction is used to judge whether there is potential hazard in APU. In order to realize the potential hazard identification of APU, a hazard prediction model based on LSTM is proposed. The LSTM can predict the target parameter value at time t by learning the historical data of the target parameter before the time t. Four parameters (EGTA, P2A, STA and OT) are selected to represent the performance of APU by SVM-RFECV. Therefore, these four parameters are selected as the prediction object.
The 707 sets of A13 report data with continuous time from January 2015 to March 2017 are used for hazard prediction in this paper. The original data of the four parameters EGTA, P2A, STA and OT are shown in Fig. 8. In Fig. 8, two sets of data corresponding to the parameter EGTA, P2A and OT are zero. The reason for this problem is that there is a problem in the acquisition equipment during the data acquisition process. The data acquisition was unsuccessful. Therefore, these two sets of data are deleted. Finally, 705 sets of data are selected  for hazard prediction. The 705 sets of A13 report data are divided into three parts for different purposes, including training data, validating data and testing data. The training data is used to train the prediction model. The validating data is used to select the prediction model with best performance. The testing data is used to evaluate the selected prediction model.
The look back time is the number of previous time to be selected as input for time series prediction. Different look back times {10, 20, 30} are set in this paper. When the look back time is 10, the first 10 data are input into the network to train the prediction model. The 11th data is used as the target value for supervised learning. Then, one bit is moved along the timeline. The next 10 data are input into the network with the 12th data as the target value. The training is completed by iterating until all the training data are input into the model. For predicting the value at time t, 10 consecutive historical data starting from the time point (t-10) are input into the model. According to the weight after training, the predicted value at time t is calculated.
When look back time is selected as n, the data from 1st set to n-th set are selected as the first set of input data. The data from 2nd set to (n + 1)-th set are selected as the second set of input data. By analogy, a total of 705-n sets of n-dimensional input data will be generated. The results of dividing 705-n sets of input data according to the look back time are shown in Table 9.

B. HAZARD PREDICTION MODEL 1) RNN
RNN is a multi-layer perceptron network, including input layer, hidden layer and output layer. RNN performs the same VOLUME 8, 2020  task on each input in the sequence. Its output depends on previous calculations. RNN can map the entire historical information of the previous input to each output. The complete RNN network is shown in Fig. 9.
After the input x t is input into RNN network at time t, the value of the hidden layer is h t and the output value is o t . The value of h t depends on x t and h t−1 . The forward calculation process of RNN is represented by (13) and (14).
The output layer is calculated by (13). V is the weight matrix of the output layer. g is the activation function. The hidden layer is calculated by (14). The hidden layer is a cyclic layer. U is the weight matrix of the input x t . W is the weight matrix of the previous value h t−1 . f is the activation function. A back-propagation through time (BPTT) algorithm has been designed to back-train the RNN. The pseudo-code of BPTT is given in Table 10.
The BPTT algorithm propagates the error term value δ l t of time t at the l-th layer in two directions. One direction is to pass it to the upper layer network to get δ l−1 t . This part is related to the weight matrix U . The other direction is to pass it along the timeline to the initial time t = 1 to get δ l−1 1 . This part is related to the weight matrix W . The error δ l−1 t passed to the upper layer and the error δ T k at any time k can be calculated by (15) and (16).
where diag[A] means to create a diagonal matrix based on the vector A. net l t is the weighted input of the neuron in the l-th layer at time t. f l is the activation function of the l-th layer. net t is the weighted input of the neuron at time t.
The gradient ∇ W t E of the weight matrix at time t can be calculated by (17). The gradient ∇ W E of the cyclic layer weight matrix W can be calculated by (18).
Unfortunately, RNN does not handle long sequences very well in practice. One of the main reasons is that RNN is prone to gradient explosion and gradient vanishing during training. This results in that the training gradient does not being transmitted all the way through the long sequence. The RNN cannot capture the effects of the long distance. The proof process of the gradient explosion and gradient vanishing problem of RNN is as follows.
where β f and β W are the upper limit of modules of diagonal matrix diag f (net i ) and matrix W , respectively. When the value of t − k is large, the result of (19) becomes extremely large (when the value of β f β W is greater than 1) or extremely small (when the value of β f β W is less than 1). It can be concluded from (17) and (18) that a gradient explosion occurs in the RNN when the result of (19) is extremely large. When the result of (19) is extremely small, a gradient vanishing occurs in the RNN.

2) LSTM
The LSTM network solves the problem of the vanishing gradient of the RNN. The hidden layer of the RNN has only one state h. LSTM saves long-term state by adding a state c. The structure of LSTM is shown in Fig. 10. It can be concluded from Fig. 10 that the LSTM has three inputs at time t. The first is the current time input value x t of LSTM. The second is the previous time output value h t−1 . The last is the previous time cell state c t−1 . The LSTM has two outputs at time t, including the current time output value h t and the current time cell state c t . Three gates are used by LSTM to control the value of cell state c. The first one is the forget gate f t , which determines how much information of the previous time cell state c t−1 is reserved to the current time cell state c t . The second one is the input gate i t , which controls how much information of the current input x t is saved to the cell state c t . The third one is the output gate o t . The LSTM uses the output gate to control the information of the cell state c t is output to the current output h t . The forward calculation equations of LSTM are as follows.
where  (26) and (27). In fact, the weight matrix W * is composed of two matrices, where * represents f , i, c and o. One is W * x , which corresponds to the input x t . The other is W * h , which corresponds to the previous output h t−1 . In the process of back-train, W * x is used to calculate the error δ l−1 t . W * h is used to calculate the error δ T k . Therefore, the weight matrices W f , W i , W c and W o are all written as two separate matrices W fx , W fh , W ix , W ih , W cx , W ch , W ox , W oh in (26) and (27).
where net l t is the weighted input of the neuron in the l-th layer at time t. δ T o,t , δ T f ,t , δ T i,t and δ T c,t represent the errors of o, f , i andc which back propagate along the timeline at time t, respectively. δ T o,t , δ T f ,t , δ T i,t and δ T c,t can be calculated from (28) to (31).
The gradients of the weight matrix and the bias can be calculated from (32) to (43).

3) PARAMETER SETTING
To validate the prediction performance of the LSTM, the LSTM algorithm is compared with three benchmark algorithms such as k-nearest neighbor (KNN) regression, RNN and classic back propagation neural network (BPNN).
In addition, to better compare the prediction algorithms, approximate parameters are set in these models. The parameter settings of all models are shown in Table 11. In Table 11, the 10 days, 20 days and 30 days are look back time. All the experiments are carried out on a desktop PC with a 2.81 GHz Intel i7 processor and 8GB of memory. The operating environment parameters are shown in Table 12.

4) EVALUATION INDEX
In order to evaluate the performance of prediction model, four general error criteria are used.
(1) Mean absolute error (MAE) is the average value of absolute errors, which can better reflect the actual situation of the error for the predicted value.  (2) Mean absolute percentage error (MAPE) is used to measure the prediction errors between the predicted values and actual values. MAPE considers not only the error between the predicted value and the actual value, but also the ratio between the error and the actual value.
(3) Root mean square error (RMSE) is the square root of the average squared difference between the predicted values and the actual values. It measures the deviation between the predicted value and the actual value. RMSE is often used as a standard for measuring the prediction results of machine learning models.
(4) The evaluation results of the above evaluation indexes will be quite different in units. R-squared (R 2 ) is the value obtained by relativizing the sum of squared residuals. R 2 can reflect the universality of the model. For different models whose input data is measured in different units, R 2 can be used to give unified evaluation result of the prediction effect. The interval of R 2 is between (0, 1). The closer the value of R 2 is to 1, the better the prediction effect of the model.
In this paper, four parameters are predicted. The unit of each parameter is quite different. Therefore, R 2 is selected to evaluate the prediction effect of the LSTM model on the parameters of different units. R 2 is defined by (47).   From (44) to (47), A t is the actual value. A t is the average of actual value. F t is the predictive value. n is the number of sample.
Promoting percentage of error criterion is used to compare the performances between two prediction models. The error promoting percentage is defined by (48).
where P E is improvement percentage of error criterion. E stands for the error such as MAE, MAPE and RMSE. E ben and E imp is the error calculated by benchmark model and improved model, respectively.

5) RESULTS AND DISCUSSION
The training performances of the four models corresponding to different look back times are shown from Table 13 to   Fig. 11. It can be observed from Table 13 ∼ Table 16 and   When the look back time is 10 days and 20 days, the prediction error values of RNN and LSTM are similar. The prediction performance of RNN is occasionally better than that of LSTM when selecting a short look back time. When the look back time is 30 days, the prediction error of RNN is significantly greater than that of LSTM. That is because when a long look back time is selected, the RNN appears gradient explosion, which leads to a larger prediction error. Different from RNN, the LSTM reduces the possibility of gradient explosion by adding a forget gate to filter long-term VOLUME 8, 2020   data. Compared with BPNN, KNN and RNN, the LSTM has the best prediction performance in most cases. In addition, the LSTM model has the highest R 2 value, which is closest to 1. The LSTM model has the highest degree of fitting between the predicted trend and the actual trend.
To further illustrate the superiority of the LSTM, EGTA is selected for detailed analysis. The prediction results of the four trained models on EGTA are shown in Fig. 12. The black line represents actual value. The four colored lines represent the prediction results of four models. When the look back time is 10 days, the prediction results of EGTA in the four models are shown in the first row. When the look back time is 20 days and 30 days, the prediction results of EGTA in the four models are shown in the second and third rows, respectively. It can be drawn from Fig. 12 that the blue line representing LSTM is consistent with the actual value.
In order to quantitatively compare the prediction effects of LSTM and other three models, the promoting percentage of error criterion is used. The quantitative comparison results of the LSTM model with other models are shown from Table 17 to Table 19.
It can be concluded from Table 17   It can be concluded from Table 18 that compared with BPNN, the MAE of LSTM are reduced by 1.81%, 10.68%, and 33.79%, respectively. The MAPE are reduced by 1.74%, 10.88%, and 33.65%, respectively. The RMSE are reduced by 3.54%, 11.69% and 28.91%, respectively. For long-term sequences, the LSTM with deep structure has better prediction performance than the BPNN with simple structure.
According to Table 19, we can find that compared with RNN, the MAE of LSTM are reduced by 1.91%, 4.17% and 22.92% respectively when looking back time is selected from 10 days to 30 days. The MAPE are reduced by 1.74%, 5.15% and 23.08%, respectively. The RMSE are reduced by 2.95%, 2.3% and 14.63%, respectively. In the case of the look back time takes a smaller value, such as 10 days, the difference between the prediction performance of LSTM and the prediction performance of RNN is very small. With the increase of the look back time, the advantage of LSTM with a forget gate becomes obvious for long-term sequence predictions.
In summary, the prediction performance of LSTM is better than that of the other three algorithms. In addition, it can be obtained from Table 17 to Table 19 that when the look back time is 30 days, the prediction performance of the LSTM on the time series is better than that when the look back times are 10 days and 20 days. Therefore, the LSTM with the 30 days look back time is selected as the hazard prediction model.
The testing data is input into the trained LSTM model with 30 days look back time to predict the four target parameters. The prediction results of the four parameters are shown in Fig. 13 and Table 20. In Fig. 13, the error value is the absolute difference between the predicted value and the actual value in each point.
It can be seen from Fig. 13 that the LSTM predicted value curve of the four parameters is consistent with the actual value curve. When the data is abrupt, the LSTM model can quickly make adjustment to make accurate prediction. It also can be found from Table 20 that compared with other models, the evaluation index values obtained by LSTM for different parameters are smallest. However, the RMSE values of four parameters are quite different in Table 20. That is because the actual value ranges of four parameters are quite different. It can be seen from Fig. 13 that the actual value range of EGTA is 400 to 500. The actual value range of P2A is 950 to 1000. The actual value range of OT is 50 to 100. The actual value range of STA is 40 to 70. According to the definition of RMSE, the range of data has a great impact on the RMSE value. In addition, Fig. 13 also shows that the fluctuation frequencies of EGTA and P2A are higher than that of OT and STA, which also causes the errors of EGTA and P2A to be larger than that of OT and STA. Nevertheless, we can still find that from Table 13 to Table 16, the RMSE value and other errors of the LSTM model in Table 20 are lower than that of other models. This shows that LSTM has the best prediction effect. Furthermore, R 2 can give unified evaluation results for data in different units. In Table 20, the R 2 values of LSTM for the four parameter testing data are 0.9792, 0.9767, 0.9935 and 0.9434, respectively. The R 2 values are all close to 1. Therefore, the prediction results of LSTM on the four testing data are consistent with the actual values. In summary, LSTM has good robustness and predictive effect. The LSTM can be used for hazard prediction of aircraft.

VI. CONCLUSION
In this paper, a PSO-SVM and LSTM based method for hazard identification and prediction is proposed. The proposed method uses ACARS data to identify and predict the hazard in aircraft. The A13 report of APU generated by ACARS is selected as analysis data. The analysis results show that the proposed hazard identification method based on SVM has the higher accuracy than the other identification method. In addition, after reducing the dimension by using RFECV and optimizing by using PSO, the identification accuracy and speed of SVM are improved. The optimized identification speed of the hazard identification model reaches 15000obs/s. The optimized average identification accuracy reaches 90.99%. Besides that, the analysis results also show that the proposed hazard prediction method based on deep learning method of LSTM has the high prediction accuracy and the highest fitting for the four parameters selected by the SVM-RFECV. By comparing with three benchmark algorithms such as KNN, RNN and BPNN in different look back time, the LSTM with 30 days look back time has the best prediction performance and lowest prediction error. The R 2 values of LSTM with 30 days look back time for the four parameter testing data are all close to 1. The LSTM has good robustness for the prediction of different parameters. In summary, the proposed method for civil aircraft hazard identification and prediction based on PSO-SVM and deep learning method LSTM has the characteristics of fast identification speed, high identification accuracy and low prediction error. From 1983 to 1996, he worked as an Associate Professor with the School of Mechanics and Electronics, Nanjing University of Aeronautics and Astronautics, Nanjing, China. Since 1996, he has been a Professor with the College of Civil Aviation, Nanjing University of Aeronautics and Astronautics. His research interests include mechanical condition monitoring, fault diagnosis, and lubricating oil wear debris monitoring.
HAN WANG received the B.E. degree in mechanical engineering from Henan Polytechnic University. He is currently pursuing the Ph.D. degree in vehicle operation engineering with the Nanjing University of Aeronautics and Astronautics, Nanjing, China. His research interests include mechanical condition monitoring, fault diagnosis, and lubricating oil wear debris monitoring.
HONGSHENG YAN received the B.S. degree in traffic engineering from Yangzhou University, Yangzhou, China, in 2014, and the M.S. degree in vehicle utilization engineering from the Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2016, where he is currently pursuing the PhD. degree with the College of Civil Aviation. His research interests include condition monitoring, diagnosis, and health management of the aircraft based on the flight data. VOLUME 8, 2020