A Novel Data-Driven Tropical Cyclone Track Prediction Model Based on CNN and GRU With Multi-Dimensional Feature Selection

Strong tropical cyclones have made a drastic effect on human life and natural environment. As large amounts of meteorological data and monitoring data continue to accumulate, traditional methods for predicting tropical cyclone tracks face numerous challenges regarding their prediction efficiency and accuracy. Deep learning methods recently have been proven to be able to learn both spatial and temporal features from a large amount of dataset and be extremely efficient and accurate for forecasting data in complex structures. In this paper, we propose a novel data-driven deep learning model to predict tropical cyclone tracks using the spatial locations and multiple meteorological factors. This model comprises a multi-dimensional feature selection layer, a CNN layer and a GRU layer. The proposed model was trained using a dataset of real-world tropical cyclones from the years 1945 to 2017. Through the comparison experiments, the results verify that the proposed model outperforms the traditional forecasting methods, including a climatologically aware forecasting technique, the Sanders Barotropic technique and a numerical weather prediction (NWP) model. In addition, the proposed model has better accuracy than some deep learning methods, including RNN, GRU, CNN, AE-RNN, CNN-RNN, and CNN-GRU without the proposed feature selection layer.


I. INTRODUCTION
Tropical cyclones are a type of mesoscale or synoptic warm cyclone generated on the ocean surface in tropical and subtropical regions. A strong tropical cyclone may become a typhoon (hurricane), an extremely destructive and unpredictable natural disaster that is responsible for the loss of life and property each year. For example, in the year 2017 alone, there were 8 typhoons landing in China, which affected 5.879 millions of people and caused about 5 billions of dollars financial damages [1]. These days, the frequency of severe typhoons has increased, which makes the prediction of tropical cyclone tracks become even more important [4], allowing citizens and governments to be better prepared when faced with such a disaster. However, when a tropical cyclone forms, it can be affected by numerous factors such as the The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. meteorological environment, thermodynamics, and kinetics of the tropical cyclone system. After a tropical cyclone lands, its track will be affected by complex marine, the shoreline of the coastal areas, and the topography of the inland areas [2]. All of these complex problems make the tropical cyclone track prediction being a significant challenge. Therefore, given the impact of tropical cyclones on society and the complexity involved in their prediction, it is important to explore and apply new tropical cyclone track prediction techniques.
Traditional track prediction methods are mainly based on a numerical method and statistical formulas [3]. Although these methods have been widely used along with the development of computer technology and monitoring methods, such an approach continues to suffer from high complexity and relatively low prediction accuracy [6]. And these traditional methods are limited by their own characteristics, such that they cannot analyze the implicit data features, and rely too much on the rules summarized by the historical data without considering the complex correlation of meteorological variables and the tropical cyclone tracks on both attribute and temporal dimensions. In addition, as meteorological satellites, ocean observation stations and ground stations are gradually being established, the amount of data available is accumulating, which is changing the prediction of tropical cyclone tracks into a big data problem. Recently, deep learning models have been applied to image processing, natural language processing, and object detection successfully [7], [8], [49]. They have shown great advantages in the processing of large amounts of complex time series data in two ways. First, deep learning models can extract implicit features from a dataset within multiple variables and improve the generalization capability, e.g., through the use of a convolutional neural network (CNN). Second, a deep learning model is more efficient in processing large-volume datasets as a time series problem, e.g., using a Gated recurrent unit (GRU) network to extract temporal correlation from a time series dataset [9]. For example, in [5], a CNN model is used to understand the spatial relationship of typhoon features and typhoon formation. In [69], a GRU model is used to examine the temporal sequence of relations in typhoon progression. Therefore, a deep learning method is an appropriate approach for our research into track prediction.
In this study, we proposed a novel fusion tropical cyclones track prediction method based on CNN and GRU models. The method is composed of three layers. In the first layer, a multidimensional feature selection process is proposed, which can select the meteorological variables that make the biggest influence on the tropical cyclone tracks based on Granger Causality analysis, and select the most correlated time range based on the autocorrelation analysis and partial autocorrelation analysis. In the second layer, the selected meteorological variables with tropical cyclone tracks in the correlated time range are processed as a series of two-dimensional matrices. It means the two-dimensional matrix will be generated for each timestamp by a sliding window technique. Then, the matrix of each timestamp is used as input to the CNN model. In the CNN model, a customized convolutional kernel can extract implicit features of the input matrix and learn the correlation from the meteorological variables and the tropical cyclone tracks in the correlated time range. In the third layer, a GRU-based prediction model is proposed. It can take the extracted features from the second layer in time series format, and learn the temporal features from it. The GRU model is a variant of the RNN model that can establish connections between neurons within layers. Not only that, the GRU model can solve the long-term dependency problems in RNN by adding gate structures and memory cells [10]. Then, the learned temporal features are used to predict the center location of the target tropical cyclone at the next timestamp. Generally speaking, our proposed model can effectively select the most correlated meteorological variables and time range that affect the tropical cyclones tracks from the multi-dimensional feature selection layer, which can eliminate unimportant data and improve the prediction efficiency. The CNN layer and the GRU layer can learn the temporal features, and make an accurate prediction of the cyclone track.
Therefore, our study aims at providing a fusion deep learning model that can deal with the big data problem of traditional prediction methods and improve the accuracy of tropical cyclone track prediction. Hereinafter we unfold our main contributions as follows: (1) A multi-dimensional feature selection layer is proposed in our method to choose the most correlated meteorological variables and time range to tropical cyclone tracks from the perspective of attribute correlation analysis and temporal correlation analysis. This layer is essential for evaluating the impacts of meteorological variables on tropical cyclone track forecasting in this paper.
(2) The deep feature extraction is considered in our work, where a CNN layer of our proposed model tends to learn and extract implicit features of the meteorological variables and tropical cyclone tracks in a correlated time range, and a GRU layer, taking output of the CNN layer as input, tends to mine the deep temporal features.
(3) The proposed model was validated using a real-world tropical cyclones dataset, and the experimental results suggest that our proposed model can achieve a more accurate prediction result than some existing traditional tropical cyclone forecasting methods and deep learning methods.

II. RELATED WORK
The four traditional methods to predict typhoon tracks contain numerical model, statistical model, dynamic model and integrated model [11]. The numerical model needs a massive computing power to deal with the complex dynamic formula [12]. With this model, a grid system needs to be generated to model the interior structure of the tropical cyclone and simulate it in real-time. However, the numerical model is computationally intensive. Compared to a numerical model, the computations of a statistical model are lightweight, requiring only the tropical cyclone behavior patterns to be calculated from the historical data [13]. But the prediction accuracy is relatively low. The dynamical model can use massive variables in a meteorological dataset as a series of prediction factors to improve accuracy [3]. In addition, the integrated model can combine multiple models, physical parameters and initial conditions into a single prediction model, which can achieve a better prediction result than a single model [14]. However, with an increasing number of weather satellites, ocean stations, and ground stations, the amount of meteorological data is increasing. Using the traditional models to find non-linear typhoon patterns from the big spatio-temporal dataset is difficult.
The rapid development of deep learning opened up another way in climate field including precipitation estimation, extreme climate detection, weather classification, tropical cyclone intensity estimation and track prediction [15]- [17], [41]- [44], [54]. A Recurrent Neural Network (RNN) is one of the most widely used deep learning methods and it can be VOLUME 8, 2020 used to simulate human memory cells [18]. It has shown great promise in tackling sequential data, including image segmentation [19], sound recognition [20], genes engineering [21] and stock prediction [22]. RNN models have been applied to typhoon and hurricane predictions as well. For example, Xu et al. [23] generated a novel method based on an RNN and an attention mechanism to predict typhoon tracks. The results of this model indicate that the deep learning method can be used to improve the prediction accuracy to a certain extent. However, naive RNNs face a technological challenge, namely, a long-term dependencies problem, in which as the time interval increases, the learned information cannot be connected to far-reaching information, which will lead to vanishing gradients. Therefore, the Long Short-Term Memory (LSTM) [24] was proposed to solve this problem and can represent dynamic temporal behavior for sequential data, which is similar to an RNN although the hidden layer in RNN is replaced by a long short-term memory (LSTM) cell. However, to reduce the training parameters in LSTM, a GRU model that is a variant of LSTM [10] was developed. There are only two gates in a GRU model, namely an update gate and a reset gate, making the GRU model being a simpler structure than an LSTM network. Meanwhile, a CNN model help to capture spatial features from weather phenomena and meteorological variables [37], which has been widely used in image recognition, natural language processing and so on [38]- [40].
Previously, many experts and scholars have studied the trajectory tracking by videos or images with the help of object tracking [55]- [57], [59]- [62], where a CNN structure is used in position tracking based on image data. In view of the temporal dimension of tropical cyclones, the tropical cyclone track prediction is a task of time sequence prediction. Specifically, Gao et al. [25] proposed a typhoon prediction model based on an LSTM model and trained the model by using a typhoon dataset from the years 1949-2012, which can predict 6 to 24 hours of typhoon tracks. Shen et al. [57] proposed a fused neural network that contained one neural network using past trajectory data and one convolutional neural network using reanalysis atmospheric wind fields images. The network was trained to forecast 6-h longitude and latitude of hurricanes. The results suggested that the average error distance was significantly lower than mainstream approaches. Kim et al. [26] proposed a ConvLSTM-based tracking and forecasting model to predict hurricane trajectories by using satellite images. Additionally, Su [31] proposed a variant of GRU model that including a convolutional structure to predict cloud movement. In this model, both the input and output were spatio-temporal sequences that consisted of ordered satellite images. Compared to ConvLSTM model, their proposed model had fewer parameters, and the experiment results indicated that this model performed well on GOES satellite data. However, most of the convolutionalbased deep learning models took two-dimensional images as model input, and the LSTM-based models needed to deal with the excessive parameters in the training process. There still exist some difficulties in learning the implicit features from time series data of tropical cyclone trajectories and meteorological variables efficiently and accurately.
Although there were many encouraging results obtained by deep learning methods, most of the methods neglected a key factor in prediction, namely, the feature selection process that was the process of selecting the most correlated features to a typhoon track. The feature selection is important to improve the performance of deep learning models. The first reason is that it can select the most important variables to train the deep learning model, which can increase the model accuracy. Second, it reduces the data dimensions, which can improve the training speed. Carta et al. [32] performed the feature selection process in wind speed prediction by using correlation analysis in meteorological data. Seo et al. [33] used feature selection for short-term heavy rainfall prediction based on evolutionary computation. The results of these papers proved that the prediction accuracy was significantly improved by feature selection. Granger causality test was a technique to determine whether a time series is useful in prediction, which was originally proposed to analyze causality relationships among economic variables in 1969s by Granger [45]. Recently, Granger causality test has been used in meteorological areas to analyze the complexity of the causality between meteorological factors [34]. For example, Feng et al. [35] proposed a multi-model methodology for short-term wind forecasting, the Granger causality test was implemented for feature selection before model training. Wang and Song [36] applied the Granger causality test to conduct dynamic effect analysis among meteorological conditions on air pollution. However, currently, there is limited work that analyzing both the most correlated meteorological variables that affect the tropical cyclone tracks, and the most correlated time range to the next timestamp before training the deep learning model in order to make an accurate track prediction.

A. PROBLEM FORMULATION
Given a series of tropical cyclones X = {tc 1 , tc 2 , . . . , tc n }, where each tropical cyclone tc i ∈ X has a track L i = l i,t1 , l i,t2 , . . . , l i,tm that is characterized by a series of spatial coordinates indicating the center locations of tropical cyclone tc i from timestamp t1 to timestamp tm, where each l i,tj can be defined by a set of spatial coordinates lat i,tj , lon i,tj , tj ∈ T = {t1, t2, . . . , tm}, where t1 is the beginning timestamp and tm is the end timestamp. Therefore, l i,tj can represent the center location lat i,tj , lon i,tj of the tropical cyclone tc i at a corresponding timestamp tj. Also, each tropical cyclone tc i has a number of meteorological factors that affect the tropical cyclone track at timestamp tj, denoted by a vector F i,tj = f i,tj,1 , f i,tj,2 , . . . , f i,tj,c , where c represents the number of meteorological factors. Therefore, each tropical cyclone tc i can be represented by a two-dimensional matrix.
where the row in tc i represents the latitude, longitude and meteorological factors, and the column represents an ordered set of timestamps T = {t1, t2, . . . , tm}. Given a series of tropical cyclones X = {tc 1 , tc 2 , . . . , tc n }, our main objective is to construct a data-driven prediction method to predict tropical cyclone track based on deep learning networks. In order to achieve this objective, the first step is to analyze the attribute relationship between different meteorological variables and tropical cyclone tracks based on Granger causality, and then analyze the temporal relationship based on autocorrelation analysis and partial autocorrelation analysis. This step can select the most correlated F meteorological variables and the most correlated k time range to track prediction. Then, these variables will be chosen as the input dataset to our proposed deep learning model. The deep learning model is composed of a Convolutional Neural Network (CNN) layer that can extract feature vectors from the meteorological data and track data, and a Gated recurrent unit (GRU) layer that can learn the temporal features from the k timestamps and predict the center location of the tropical cyclone at the next timestamp.

B. THE PROPOSED METHOD
The structure of the proposed method consists of three parts, shown in Figure 1. Given the tropical cyclone tracks and meteorological variables, the first part is to select the most correlated meteorological variables and time range by a multi-dimensional feature selection layer. From the attribute dimension, the Granger causality test is conducted to choose the variables that make a big effect on tropical cyclone tracks, following by the autocorrelation analysis and partial autocorrelation analysis to choose the correlated time range from the temporal dimension. Then, the tracks and selected meteorological variables are normalized and processed as a series of two-dimensional matrices using a sliding window technique. In the second part, these matrices are used as input to a CNN layer for compressing and extracting implicit features between meteorological variables and tropical cyclone tracks. In the third part, these extracted features vectors in time series from the CNN layer are used as input to a GRU forecasting layer that can learn the temporal features and predict the tropical cyclone center location at the next timestamp.

1) THE MULTI-DIMENSIONAL FEATURE SELECTION LAYER
The tropical cyclone tracks are affected by some complicated atmospheric environments, such as maximum sustained wind speed, central pressure. And the performance of a datadriven model highly depends on its input dataset. Therefore, the multi-dimensional feature selection layer is proposed as the first layer of our model. From the attribute dimension, this layer can select the most correlated variables to the tropical cyclone track. From the temporal dimension, this layer can analyze the most correlated time range to the target timestamp. The objective of this layer is to reduce the dimension of input variables and remove irrelevant noise data, which can improve the prediction accuracy and efficiency in the following layers.
The process of the multi-dimensional feature selection layer is illustrated in Algorithm 1, where first, the Granger Cointegration = Johansen cointegration test 5: if Stationarity is True then 6: Granger causality test Granger causality test 12: end if 13: Autocorrelation and partial autocorrelation analysis causality test is performed to select the most correlated meteorological variables from the attribute dimension, and then the autocorrelation analysis and partial autocorrelation analysis are performed to find the most correlated time range from the temporal dimension.
The traditional correlation analysis methods, such as covariance and Maximum Information Coefficient (MIC) can only analyze the linear correlation or non-linear correlation between variables, and then remove the features that are irrelevant to the target variable. However, not all of the relevant variables analyzed by covariance and MIC are useful for track forecasting. In order to further explore the correlations between the meteorological variables and tropical cyclone tracks, the Granger causality analysis was used in our model. The Granger Causality Test (GCT) between variables X and Y is defined as following: if the prediction result using the jointing past information of variables X and Y is superior to use variable Y only, namely, variable X helps to explain the change in the future, then X is the granger cause of Y . The process of GCT is shown on lines 2-12 of Algorithm 1. First, the prerequisite for Granger causality analysis is that the time series data must be stationary, otherwise false regression problems may occur. Therefore, the unit root test should be conducted for the stationarity of each time series variables. In this study, the Augmented Dickey-Fuller unit root test (ADF) [20] is used for the stationary test. The null hypothesis of the ADF means that if the unit root exists, then the time series variable is not stable. If the statistics obtained by ADF test are significantly less than the critical statistics of 3 confidence levels (1%, 5%, 10%) and the p-value is very close, it indicates the null hypothesis is rejected and time series are stable. Otherwise, the co-integration test needs to be implemented, shown on line 4. If there is a co-integration relationship between the two sequences, then the GCT can be conducted. The specific implementation process of GCT is to estimate the unconstrained regression model using Equation (2), then estimate the constrained regression model using Equation (3).
Through GCT, the meteorological variables that affect the tropical cyclone tracks most are selected. However, the meteorological variables at one timestamp will make an influence on tropical cyclone track in the following timestamps. Therefore, determining the best correlated time range can improve the prediction accuracy. In this study, autocorrelation function (ACF) and partial autocorrelation function (PACF) are conducted to determine the best time range. In statistics, the autocorrelation of a random process is the Pearson correlation (PCC) between values of the process at different times. The PCC of the first N − k values L 1 = l t1 , l t2 , . . . , l t(N −k) and the last N − k values L 2 = l tk , l t(k+1) . . . , l tN is calculated by Equation (4).
whereL 1 andL 2 are mean of the first N −k values and the last N − k values, respectively. If ignoring the difference between L 1 andL 2 , the formula of autocorrelation coefficient at time lag k can be shown in Equation (5).
Given the tropical cyclone tracks L = {l tl , l t2 , . . . , l tm }, the partial autocorrelation coefficient of time lag k refers to the influence of l t−k on l t after removing k − 1 random variables. The math formula of PACF is shown in Equation (6): . And the larger values of ACF and PACF indicate a more relevant relationship between the temporal sequence of l t−k on l t . After implementing the multi-dimensional feature selection layer, the original features F = f 1 , f 2 , . . . , f c will become F = {f 1 , f 2 , . . . , f d }, where (d ≤ c), and the most correlated time range k is selected as well.

2) THE CNN-GRU LAYER
In the second layer, the tropical cyclone tracks and selected meteorological variables are used as input to the CNN model for feature extraction. The input data is two-dimensional matrices in time series format, shown in Equation (7), where the row represents the latitude, longitude and selected meteorological factors F = f 1, f 2 , . . . , f d , (d ≤ c) at a timestamp, and the column represents an ordered set of timestamps tm ∈ T = {t1, t2, . . . , tm}. Moreover, to improve the training efficiency and accuracy, data normalization is used to process the input data before training the CNN model. The detailed formulas of normalization are shown in Equations (8)(9)(10).
where x i is the input value of location or meteorological variables, n refers to the dataset size, µ d is the mean of one variable, σ 2 d is the variance of one variable and x ı is normalized data. Then, these normalized matrices are used as input to the CNN layer.
CNN is a kind of feedforward neural network, in which neurons can respond to a portion of the surrounding cells. The difference between CNN and normal neural networks is that CNN uses the concept of weight sharing, which means using the same convolutional kernel of each layer [46]. Therefore, CNN is relatively easy to train and can extract deeper implicit features. The CNN model contains convolution layer and pooling layer. The convolution layer is to extract deep features from the input data and generate a feature map, which will be transferred to the pooling layer for feature selection and information filtering. Two-dimensional convolution kernels are used in this study. When the training process is completed, the deep features between meteorological variables and tropical cyclone tracks of a correlated time range can be extracted by the two convolution layers and two pooling layers of the CNN model. Namely, the trained convolution layer and pooling layer are used to generate feature vectors {F tk , F tk+1 , . . . , F tm } for training the GRU layer.
After extracting feature vectors from the CNN layer, the GRU layer is used to output the tropical cyclone location at the next timestamp. The GRU layer consists of two gates: one is an update gate that combines the input gate with the forget gate of LSTM, and the other is a reset gate. The input gate of LSTM determines how much information of the network input at the current time is saved to the cell state.
And the forget gate determines how much information of the cell state at the previous moment is retained to the current cell state. The update gate is used to control how much of the history state is retained in the output state of the current moment. The other gate is a reset gate, which acts directly on the hidden state of the previous moment and it determines how the new input information is combined with the previous memory. Therefore, these two gates can hold information in a long-term sequence. Based on this, GRU is introduced for time series prediction. The input of the GRU layer are feature vectors {F tk , F tk+1 , . . . , F tm } that generated from the CNN layer. The output is the predicted latitude and longitude of the target tropical cyclone at the next timestamp.
In summary, to predict the tropical cyclone track based on historical track data and meteorological data, first, the attribute correlation and temporal correlation are analyzed in the multi-dimensional feature selection layer. Then, the selected variables along with the tracks are used as input data to extract deep features in the CNN layer. The extracted feature vectors in time series format are used as input to the GRU layer, which can learn temporal features and predict the center location of a tropical cyclone at the next timestamp.

C. TRAINING OF THE CNN-GRU MODEL
As shown in Figure 1, our proposed tropical cyclone track prediction model has three layers. Since the multidimensional feature selection layer need not be trained, the training process is divided into two stages, the CNN layer training and the global model training.

1) THE CNN LAYER TRAINING
The CNN layer training process is shown in Figure 2. For timestamp 1, the input is a two-dimension matrix that contains tropical cyclone tracks and meteorological variables from timestamp 1 to timestamp k, where k is the correlated time range determined from the multi-dimensional feature selection layer. For the next timestamp, the input data is from timestamp 2 to timestamp k + 1 based on the sliding window mechanism. The rest process is analogous until the training is completed. When the training is completed, the layer can extract deep features between meteorological variables and tropical cyclone tracks at each timestamp.
After taking the two-dimension matrix as input, firstly the convolution layer can extract features from the twodimensional matrix, where the convolution operation can maintain the sequencial relationship of variables. The mathematical formula of the convolution operation in layer l is shown in Equation (11). The matrix obtained by the convolution operation is called • Feature Map.
where K l j (j = 1, 2, . . . , F 1 ) are the convolution kernel and bias of j − th convolution kernel, respectively. And f is the activation function. F 1 is the convolution kernel size of this VOLUME 8, 2020 convolution layer. F 1 convolution kernels can produce F 1 feature maps. Then, the feature maps are compressed by the pooling layer.
The pooling layer, also named subsampling layer, does not change the depth of the generated feature map matrixes, but it can reduce the size of the matrix. In image processing, the pooling operation can be considered as converting a highresolution image into a low-resolution image. Through the pooling layer, the number of nodes in the following fully connected layer can be reduced. Therefore, the parameters of the entire neural network can be reduced, which can speed up the computation and prevent the overfitting problem. The common pooling methods contain maximum pooling and the average pooling, where the maximum pooling is used in this study. The pooling process in layer l is shown in Equation (12).
where β j (j = 1, 2, . . . , F 1 ) is the multiplicative bias of j − th pooling, down() is the subsampling operation and b l j is the bias. By establishing multiple convolution layers and pooling layers, the complex feature matrixes which represent information at each timestamp will be extracted for prediction. Finally, the feature matrix will be flattened into feature vectors, which will be the input of the fully connected layer. In our proposed model, the CNN model contains two convolution layers and two pooling layers. And only the feature vectors are used as input to the following layer.
The loss function of the CNN model uses the mean squared error with L 2 normal, shown in Equation (13), where L i,tk = Lon i,tk , Lat i,tk andL i,tk = Lon i,tk , Lat i,tk are observed values and fitted values of CNN at timestamp tk of tropical cyclone i respectively. M is the amount of training data. And L 2 regularization constraint is added to prevent the overfitting problem. λ is the regularization coefficient and ω is a set of parameters including weights and biases of convolutional kernel and neuron. The back-propagation algorithm and minimizing loss function are used to update weights and optimize network prediction performance. As a result, the two convolution layers and two pooling layers of the CNN model can transform the input data into feature vectors for each timestamp accurately. When the network meets the expectation, the first stage of the network training is finished, and the next stage begins.

2) THE GLOBAL MODEL TRAINING
In the second training stage, the global CNN-GRU model needs to be trained, which is shown in Figure 3. As we can see, the feature vectors {F tk , F tk+1 , . . . , F t2 k } that generated from the fully connected layer of CNN are input to the GRU layer. The output is the center location of the tropical cyclone at timestamp t2k + 1, where k is the time range calculated by the multi-dimensional feature selection layer. The training process of the GRU layer is demonstrated in the following steps. i. As mentioned above, the GRU layer consists of two gates that one is an update gate. The formula of the update gate is shown in Equation (14). In our work, each tropical cyclone track is assumed to be independent, therefore, the appropriate hidden state will be reset for each tropical cyclone.
97120 VOLUME 8, 2020 In Equation (14), σ is a sigmoid function calculated by Equation (15). F t denotes the feature vector at timestamp t, h t−1 denotes the output vector from the last GRU cell that stores the information at timestamp t − 1, and W , U , b are parameter matrices and vectors, which contribute to making linear transformations of F t and h t−1 . The update gate adds up this linearly transformed information and puts it into the sigmoid activation function, which can compress the result between 0 and 1.
ii. The other gate is the reset gate, which is used to determine how much the candidate state at the current moment will inherit from the previous moment, the formula is shown in Equation (16).
where σ is sigmoid function calculated by Equation (15) as well, F t denotes the feature vector at timestamp t, h t−1 denotes the output vector from last GRU cell. And W r , U r , b r are parameters matrices and vectors of reset gate.
iii. After getting the output of the update gate z t , which is then multiplied by the historical state h t−1 , (1z t ) is multiplied by candidate stateh t , and then added them up. By doing this calculation, we can get the final memory information of the current timestamp t. The mathematical formula is shown in Equation (17).
whereh t denotes the candidate hidden state, calculated by Equation (18). In this equation, make a Hadamard product of the reset gate output at timestamp t and the hidden state at timestamp t − 1. Then, the product is concatenated with the input at timestamp t. Finally,h t is calculated by the fully connected layer of the tanh activation function to make the range of all elements to [−1,1]. iv. At last, the predicted location at the next timestamp of this particular tropical cyclone is the output by using the fully connected layer. To reach the goal of parameter optimization, the GRU model uses back-propagation algorithm to adjust the parameters during the training process. Since the CNN layer has been trained to achieve the best performance before training the global CNN-GRU model, the parameters of CNN are retained in the global training. And the loss function is also mean squared error with L 2 regularization constraint, which is similar to Equation (13), where ω is a set of parameters including weights and bias in GRU network. When the training process completes, the model can be used to predict tropical cyclone track.
Then, taking the historical data of meteorological variables and tropical cyclone tracks as input, the CNN-GRU model repeats the training and tuning process, and then optimizes the performance of the model gradually. During the training process, the CNN layer can realize the function of extracting implicit features between meteorological variables and tropical cyclone tracks. And the GRU layer adds the time series prediction function to the model and avoids the complexity and unnecessary calculations since the input of the GRU layer is one-dimensional vectors. In general, the model can consider temporal features comprehensively and fully utilize the historical dataset to produce a more accurate prediction value.

IV. EXPERIMENT
To demonstrate the accuracy of our proposed model, we implemented some experiments to predict the tropical cyclones tracks. The experiments were performed on a workstation with Intel(R) Core(TM) i7-6498DU CPU@2.50GHz and 256GB of main memory. And all algorithms were implemented with python 3 and open-source machine learning packages, including Scikit-learn 0.20.3, keras 2.2.4, and TensorFlow 1.13.1.

A. DATASET
The proposed model was evaluated on the Western North Pacific (WNP) Ocean Best Track Data provided by Joint Typhoon Warning Center (JTWC) [47]. The dataset contained 61089 records of 2194 tropical cyclones from the year 1945 to 2017. Each record contained the 6-hourly center locations and some meteorological attributes, that including maximum sustained wind speed (VMAX), minimum sea level pressure (MSLP), Wind intensity (kts) for the radii defined in this record (RAD), radius of specified wind intensity (RAD1-RAD4), pressure in millibars of the last closed isobar (RADP), radius of the last closed isobar (RRP), radius of max winds (MRD), eye diameter (EYE) and gusts (GUSTS). The dataset was divided into three sets, where 80% of the total tropical cyclones records were used for training, 10% were used for validation, and 10% were used for testing. In other words, the training set contained 48871 tropical cyclones data tuples of 1758 tropical cyclones. The number of validation set and test set were both 6108 tuples. The experimental dataset in detail is shown in Table 1. To evaluate the accuracy of our proposed model in a realworld dataset, some super typhoons, including Typhoon Meranti, Haiyan, Tip, and Dujuan were selected in the testing set to visualize the predicted typhoon track results. Among them, Typhoon Haiyan was one of the strongest typhoons in 2013, which was also the strongest dependent tropical cyclone based on the records in the WNP dataset. This typhoon caused 16232 fatalities and up to 7.1 million financial damages, affecting people in the Philippines, mainland China, Taiwan  Province and Vietnam [48]. Figure 4 shows 80 historical records of Typhoon Haiyan in detail. Each row represents a record of the typhoon landing longitude, latitude, corresponding timestamp, and the meteorological attributes.

B. MULTI-DIMENSIONAL FEATURE SELECTION
In the dataset, there were eleven meteorological factors to be analyzed by Granger causal analysis to determine the causality between each meteorological factor and the tropical cyclone track. Since there are many missing values before the year 2000 in the JTWC dataset, the data used in the multidimensional feature selection layer was from 2000-2017 in order to fully consider the impact of each variable on the trajectory. First, ADF was applied to test stability. If the variable was stable, Granger causal analysis was implemented directly. Otherwise, the Johansen cointegration test (JTC) was conducted to determine whether the linear combination of a group of non-stationary series had a stable equilibrium relationship. If the relation existed, the Granger causal test can be implemented, otherwise, it cannot. The test results of ADF are shown in Table 2, where the critical value of the unit root test at 1% significance level is −3.430659. And under the null hypothesis that variable has a unit root, t-test statistics of the unit roots of all variables were less than −3.430659. Therefore, the null hypothesis was rejected. Then, these meteorological factors had no unit root and they were stationary series.
In the following, GCT was conducted to judge causality between the meteorological factors and tracks. In this study, the lag periods were set as 2. And the null hypothesis means the meteorological factor does not Granger cause latitude and longitude. The critical value of F at 95% confidence level was 3.0078 [34]. Assuming that the F-statistic of a variable was greater than 5, the null hypothesis was rejected in this study. The F-statistic results between meteorological factors and track are shown in Table 3, where the values within box mean that these corresponding meteorological factors are discarded, and the rest factors combining with tropical cyclone track are used as input to the deep learning model, which can predict the longitude and latitude of the tropical cyclone center respectively. As we can see, meteorological factors VMAX, RAD, RAD1-RAD4, RRP, MRD, GUSTS and EYE were causalities of latitude, and meteorological factors VMAX, RAD, RAD1-RAD4, RADP RRP and MRD were causalities of longitude.  At last, ACF and PACF were implemented, which were employed to determine the temporal dependence of meteorological variables. The results are shown in Figure 5 and Figure 6. The ACF results demonstrated that meteorological variables had obvious correlation on both the latitude and longitude from lag 1 to lag 10. And the PACF results showed that after removing the values of one time lag and two time lags after timestamp t, where one time lag was 6 hours, the PACF values were still bigger. That is to say, latitude and longitude of lag 1 and lag 2 had a high correlation with the meteorological values at the current timestamp. Therefore, when time lag was set to 2, namely time step was 2, the model can get the most accurate forecasting results. Thus, we implemented the experiments in the following processes by using the previous 12 hours data.

C. EXPERIMENT SETTING
To compare our proposed model with traditional tropical cyclone track prediction methods and deep neural network models, three traditional methods and six deep neural network models were chosen for the experiments. The traditional methods consist of a statistical forecasting method, dynamical and numerical cyclone forecasting technique, and a numerical weather prediction (NWP) model [63]. The statistical forecasting method developed by JTWC, relies on a climatologically aware forecasting technique, which is a climatology forecasting technique that produces 24-, 48and 72-h forecasts through an unweighted averaging of the motion of previous cyclones in the same basin [64]. The Sanders Barotropic technique is a typical dynamical and numerical cyclone forecasting technique that can produce 12-, 24-, 48-and 72-h forecasts [65]. The best results from Goerss et al. [66] research on the predictive effects of various NWP models over the Western North Pacific Ocean since 1992 were chosen as the NWP model results.
The deep learning models consist of RNN, GRU, CNN, AE-GRU, CNN-RNN, CNN-GRU without a multidimensional feature selection layer and our proposed model. RNN is a naive recurrent neural network [18], in which nodes are oriented to form a loop. The internal state of the network can show dynamic temporal behaviors. The CNN model and GRU model have the same structure as the corresponding part of our proposed model. The AE-GRU model is a neural network model based on Auto-Encoder (AE) and GRU, in which an AE is used for extracting features between tropical cyclone tracks and meteorological variables, and a GRU is used for forecasting time series data. The CNN-RNN model is a combination of CNN and RNN, in which GRU is replaced to RNN for comparing the performance of RNN-based model and GRU-based model. The structure of CNN and RNN in CNN-RNN model are the same as the single CNN and single RNN models. In addition, the pure CNN-GRU model refers to our proposed model but without the multi-dimensional feature selection layer. All of these models used the same input dataset as our proposed model.
The aim of the experiment is to predict the tropical cyclone track using the previous 12 hours data. The input data of each model contained longitude, latitude and the meteorological features. The output was the center location of the target tropical cyclone at the next timestamp. The experiment of each model was run 10 times, and the maximum epoch was 100. In order to get the optimal prediction results, multi-layer VOLUME 8, 2020 GRU and RNN were tested respectively, the number of neurons and learning rate were adjusted correspondingly. Ultimately, the best parameters setting for all models were listed in Table 4, where the batch-size, learning rate, training method (for prediction model) and loss function were the same as the parameters setting in our proposed model.

D. RESULTS
The Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) were used as assessment index between the normalized prediction value and the observation value. MAE denotes the mean absolute error between the predicted track and the observation track, the mathematical formula is shown in Equation (19).
where n denotes the size of the testing data, P i is the predicted value and O i is the observation value. RMSE is the average of the root mean of squared error between the predicted track and the observation track, which is calculated by Equation (20): MAPE denotes the average error ratio of the correct values, which considers not only the error between the predicted value and the observation value, but also the ratio between the error and the observation value. The formula is in Equation (21).
In general, the smaller value of these measurements indicates a more accurate prediction result. To compare the performance of three traditional methods and our proposed model, the performance measurement results are listed in Table 5, which indicates the average APE of 12-, 24-, 48-and 72-h tropical cyclone forecasts in the northwestern Pacific. The average absolute position error (APE) was used since traditional methods [64]- [66] used distance in kilometers to measure the error between the predicted location and the real location. The calculation of APE is shown in Equation (22), where R is the earth radius, Lat real , Lon real indicate the coordinates of real tropical cyclone position, and Lat pred , Lon pred is the predicted location. A larger value of APE indicates a more significant prediction error. According to the result of the multi-dimensional feature selection layer, the trajectories and meteorological variables data of previous 12 hours is used for 6-h forecasting. Using the same method, the previous 24-h, 36-h, 60-h, and 80-h data were used to predict the location at the next 12-h, 24-h, 48-h and 72-h, respectively. And there was no record of the 12-h forecast of the Climatologically-aware forecasting technique. From Table 5, it was obviously that our model outperformed the three traditional methods on 24-h, 48-h and 72-h forecast. Moreover, with an increase in the forecast time, the performances of the traditional methods decreased rapidly. In comparison, our proposed method showed a relatively stable performance for a longer forecast, the largest error was no more than 210 km, where the error of Sanders Barotropic technique had exceed to 700 km. APE = R × Arccos sin (Lat real ) × sin Lat pred × cos Lon real − Lon pred + cos (Lat real ) × cos Lat pred × π/180 (22)   To compare the performances of RNN, GRU, CNN, AE-GRU, CNN-RNN, CNN-GRU without multidimensional feature selection layer and our proposed model, we list the performance measurements results in Table 6, which contains the training results, validation results and testing results to predict the 6-hour longitude value and latitude value, respectively. The predicted results were based on the previous 12 hours tropical cyclones records, which means the time step was 2 in the experiment. The best performance of the testing results were emphasized in bold fonts. From Table 6, it was obvious that our proposed model can achieve the best performance of almost all measurements, the predicted longitude and latitude were most close to the real values. The single CNN model had the worst performance because it had no capability of extracting temporal features in time series data. And when comparing the GRU-based models with the RNN-based models, it was evident that the GRU-based model can outperform the RNNbased model, which proved that the RNN model was poor in dealing with long-term dependency of tropical cyclone tracks. Additionally, comparing the GRU model with the CNN-GRU model without feature selection, the measurement results showed that the CNN-GRU model without feature selection had a better result than the single GRU model. VOLUME 8, 2020 This indicated that a CNN layer was significant in tropical cyclone track prediction because the CNN model can help to extract implicit features between meteorological variables and trajectories. And comparing AE-GRU model with the pure CNN-GRU model (without multi-dimensional feature selection layer), the CNN-GRU model performed slightly better, which demonstrated that a CNN layer contributed more than an AE layer in tropical cyclone track prediction. At last, comparing the pure CNN-GRU model (without multidimensional feature selection layer) with our proposed model (with multi-dimensional feature selection layer), we can see that our proposed model had less error, which proved that the multi-dimensional feature selection layer that chose the most correlated meteorological variables and time range can make a positive effect in tropical cyclone track prediction.  Finally, in order to display the prediction effect of our proposed model better, we chose Typhoon Meranti, Haiyan, Tip, and Dujuan for visualization. The results of using previous 12 hours data to predict the tracks of Typhoon Tip, Meranti, Dujuan, and Haiyan were shown in Figure 7, 8, 9, and 10 respectively. The red color represented the real tropical cyclone track and the blue color represented the predicted track. The X-axis was longitude and the Y-axis was latitude. Table 7 showed the average results of MAE, RMSE, and MAPE between the normalized predicted value and observation value of the target typhoons. We can see that MAE and RMSE of the four typhoons were up to 0.001 level for both longitude and latitude. However, the model had a relatively larger error in predicting the longitude of Typhoon Haiyan, which was because that Typhoon Haiyan had a wider range of longitude motion in its life.

V. CONCLUSION
In this paper, we proposed a novel deep learning model based on CNN and GRU models to predict the tropical cyclone's tracks using historical tropical cyclones data and multiple meteorological variables. Moreover, for eliminating data redundancy and reducing the complexity of the prediction model, a multi-dimensional feature selection layer based on the Granger Causality test, ACF, and PACF was conducted. The Granger Causality test can analyze the causality between meteorological factors and tracks and get rid of the factors that have lower correlation with the track. The ACF and PACF can determine the most correlated time range for the sequential model. Then, the CNN layer was implemented to model and extract deep features from the selected meteorological variables and tropical cyclone locations. At last, the GRU layer was introduced to learn the nonlinearity and complexity of time-series information from the feature vectors, and then predict the center location of the target tropical cyclone at the next timestamp. Our proposed model can effectively handle the time series data and improve the generalization capability. The experimental results that using the JWTC dataset to predict the tropical cyclone tracks reveal that our proposed model can achieve a better result than traditional deep learning models, where the average MAE of longitude and latitude were only around 0.002 and 0.003, respectively.
In conclusion, the proposed model is suitable for processing tropical cyclone tracks with meteorological features. But there are still some limitations. First, in order to achieve the uniformity of dataset and experiment settings, some earlier prediction methods were used in comparison experiment. Second, some tropical cyclones happened on a close time range, they may influence each others moving track. Additionally, one tropical cyclone may split into multiple cyclones. And the tropical cyclones with different intensities may have different influences on predicting the center location of the next timestamp. In the future, we will compare our proposed method with more recent numeric and statistical methods and improve our model by considering multiple tropical cyclones and taking tropical cyclone intensities into consideration.