Dual-Branched Spatio-temporal Fusion Network for Multi-horizon Tropical Cyclone Track Forecast

Tropical cyclone (TC) is an extreme tropical weather system and its trajectory can be described by a variety of spatio-temporal data. Effective mining of these data is the key to accurate TCs track forecasting. However, existing methods face the problem that the model complexity is too high or it is difficult to efficiently extract features from multi-modal data. In this paper, we propose the Dual-Branched spatio-temporal Fusion Network (DBF-Net) -- a novel multi-horizon tropical cyclone track forecasting model which fuses the multi-modal features efficiently. DBF-Net contains a TC features branch that extracts temporal features from 1D inherent features of TCs and a pressure field branch that extracts spatio-temporal features from reanalysis 2D pressure field. Through the encoder-decoder-based architecture and efficient feature fusion, DBF-Net can fully mine the information of the two types of data, and achieve good TCs track prediction results. Extensive experiments on historical TCs track data in the Northwest Pacific show that our DBF-Net achieves significant improvement compared with existing statistical and deep learning TCs track forecast methods.


INTRODUCTION
Tropical cyclones (TCs) are low-pressure vortexes occurring over the tropical or subtropical oceans, which are one of the major meteorological disasters facing mankind.Depending on the region of occurrence, they are often referred to as typhoons or hurricanes.Accurate forecasting for the TCs trajectory can greatly reduce the damage to people and property caused by TCs.
The research on TCs track forecast has gone through four stages since the 1960s, including empirical methods, statistical methods, numerical methods and deep learning methods.Early methods of TCs track forecast were limited by observation techniques and computational devices and could rely only on the subjective experience for achieve forecasting.Thus, some traditional methods such as extrapolation and similar path methods were developed.From 1980s, with the rapid development of statistical models, forecasting models based on statistical regression methods such as Climatology and Persistence (CLIPER) [1] were proposed one after another.However, poor representation capabilities and manual feature selection make it difficult to produce accurate forecast results.Since the 1990s, due to the continuous improvement of observation techniques and computer performance, the Numerical Weather Prediction (NWP) system (e.g.American National Hurricane Center Track and Intensity Model) gradually become the mainstream choice for official meteorological forecasting agency.NWP achieves forecasting by solving complex partial differential equations of weather dynamics, which is very computationally expensive and requires the support of supercomputer platforms.In recent years, machine learning especially deep learning has developed rapidly.Various deep neural networks (DNNs) based on deep learning have shown outstanding performance in tasks such as computer vision [2], natural language processing [3], time series forecasting [4], etc.Since the computational complexity of DNNs is much smaller than that of traditional NWP models, researchers have proposed many different DNNs to predict TCs track [5,6,7,8,9,10,11,12], which is also the research content in this paper.
TCs is a complex weather system, and its trajectory is affected by various physical quantities in the atmosphere, such as pressure field, wind field and so on.Therefore, different from the data used in traditional image recognition or natural language processing tasks, the data used to describe typhoon track naturally contains multi-modal temporal aligned data.In this paper, we divide these data into three categories: inherent features of TCs, remote sensing images and meteorological fields.The key to deep learning based forecasting methods is, to some extent, the full exploitation of different types of data.
The inherent features of TCs at a particular time are always represented by a column vector or tensor, which contains information such as the latitude, longitude intensity of the center of TCs at that time.Infantile deep-learning-based TCs track forecast model mainly utilize historical inherent features of TCs to predict the future locations of the TCs.The classical multi-layer perceptions (MLP) [5,6] and various of time series prediction models, such as Recurrent Neural Networks (RNNs) [7,8] , Long Short Term Memory (LSTM) model [9] and Bi-direction Gate Recurrent Unit model [12], are used to learn the time series pattern of the data.However, the time series models based only on inherent features of TCs usually have low accuracy of track forecast due to the lack of consideration of factors affecting TCs' trajectory.
Compared with the one-dimensional (1D) inherent features of TCs vector, the 2D remote sensing images and meteorological fields can describe the relevant information around the TCs.As for the TCs track forecasting using remote sensing images, it can be seen as a special kind of video frame prediction task.M. Rüttgers et al. uses a generative adversarial network (GAN) to predict the TCs track images and the corresponding location of TCs center [13].Wu et al. proposes a multitask machine learning framework based on an improved GAN to predict the track and intensity of TCs simultaneously [14].The track forecast methods above takes full advantage of the powerful performance of GANs in the field of computer vision.However, the remote sensing images used in such methods need to be acquired from geostationary satellites to ensure the high temporal resolution, and the images can not represent the physical factors affecting the TCs trajectory.
meteorological fields, such as pressure fields and wind fields, are the main factors affecting the trajectory of TCs.In 2017, M.Mudigonda et al. [15] proposed the CNN-LSTM model for segmenting and tracking TCs and verified the direct high correlation of TC tracks in meteorological fields.S. Kim proposes a ConvLSTM-based spatio-tempral model predicts the trajectory map based on the density map sequence generated from the wind velocity and precipitation fields [10].But the predicted trajectory map can not reflect the exact location of the TCs precisely.Therefore, how to efficiently fuse the meteorological fields data into the TCs track forecast model to improve the forecast accuracy has gradually become the mainstream research direction in recent years [16,11].Due to the large variation in the distribution of different meteo-rological fields, S. Giffard-Roisin et al. [11] uses different CNN models to encode the reanalysis data of wind and pressure field respectively, and fuse them with past track data of TCs.However, multiple CNN models increase the number of parameters and computational complexity of the forecast model, and the model is difficult to train.In addition, the inclusion of excessive use of meteorological field data weakens the role of the inherent features data of TCs and does not adequately learn the time-series features of the data.Therefore, how to efficiently utilize the meteorological field data and fully exploit the intrinsic time-series information of the inherent features of TCs still needs further research.
To solve the above problems, this paper fully exploits the temporal information in the inherent features data of TCs and the spatio-temporal information in reanalysis 2D pressure field data, and propose a Dual-Branched spatio-temporal Fusion Network (DBF-Net) for multi-horizon tropical cyclone track forecast (i.e.predicting the TCs' track at multiple future time steps).Specifically, as shown in Fig. 1, in TC features branch, a LSTM-based encoder-decoder network is used to capture the high-level temporal features of the input TC features and provide the multi-horizon TCs trajectory forecasting outputs.In pressure field branch, a 2D-CNN-based encoderdecoder network is used to extract the spatio-temporal features from the geopotential height (GPH) around TCs that can be used to complement the track forecasting information by predicting the GPH at multiple future time steps.Besides, the high-level spatio-temporal feature obtained from the pressure field branch is fused to the LSTM-decoder in TC features branch through a fully connected layer.
Through efficient spatio-temporal feature extraction and fusion of the two types of data, the 24h forecast accuracy of DBF-Net on historical TC tracks data in the Northwest Pacific (WNP) is 119km which is much better compared with other deep-learning based method [11,12].Besides, we also compare our work with other traditional method, such as extrapolation, CLIPER model and NWP methods.Finally, we exhibit the forecast results for several individual cases of TC events for further analysis and verification.

METHODOLOGY
In this section, we will explain our proposed Dual-Branched spatio-temporal Fusion Network (DBF-Net).The overall architecture of the proposed DBF-Net is shown in Fig. 1.The two branches contained in DBF-Net will be split into three sub-modules, and introduced separately.Algorithmic details about the DBF-Net will be mentioned in the last sub-section of this section.

Preliminaries
We formally introduce symbols and notations in this subsection.In DBF-Net, there are two types of data as the input, where t], i ∈ Z} represents the input historical reanalysis 2D geopotential height data.Given the initial forecast time t and the corresponding input data X t and G t , the output multi-horizon TC track prediction can be computed by: where, The reason for this is that the pressure field data is cropped from the center of TCs and the local information is more suitable for digging the relative changes in TCs motion.

TC Features Encoder Module
The TC features encoder module in the first branch of DBF-Net plays the role of encoding the inherent features of TCs X t = (x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ) at multiple historical times.Each x i in X t represents the latitude at time t, longitude at time t, maximum wind speed near the center at the bottom at time t, latitude difference between time t and t − 1, longitude difference between time t and t−1 and wind speed difference between time t and t − 4 respectively.The features above are also the classical persistence factors in statistical forecasting methods.As a result, the purpose of the TC features encoder module is to encode time series features of the persistence factors.
As shown in Fig. 2, the TC features encoder module consists of a two-layer stacked LSTM encoder.Each X t in sequence X t is passed sequentially into the two-layer LSTM encoding module and produces the latent variable h t .The output of each LSTM encoding layer is fed into the corre- sponding layer at next time step.The specific operation procedure in each LSTM encoding layer is as follows: where the i t , f t and o t represent the output of the input, forget and output gate in LSTM model.
and W io (W ho ) are the correspond wight matrix related to the X t (h t−1 ).c t is the cell state that fed into the next time step together with the latent variable h t .σ(•) is the Sigmoid function.
Given the input sequence X t of length m+1 and the corresponding latent variable sequence {h t−m , h t−m+1 , • • • , h t }, t is the initial forecast time, we can compute the final time series code of the TC features by:

Pressure Field Branch
In order to efficiently use the meteorological fields in the vicinity of TCs to improve the forecast accuracy.A 2D-CNN-based encoder-decoder networks are utilized to generate high-level spatio-temporal features from the reanalysis 2D geopotential height (GPH) data.3 As shown in Fig. 3 and Table 1, the encoder of the pressure field branch contains three convolutional layers, the first two of which are 2D-CNN with kernel size 3 × 3 × 3. To ensure that the GPH field data cover the full spatial extent that may affect the TC tracks, the window size at each historical time step GPH t of input 2D field G t is set to 51 × 51 values, which is approximately a radius of 1400km (the resolution of the reanalysis GPH data is 0.5 degrees).We choose LeakyReLU as the activation function of the encoder to enhance the nonlinear representation of the model.The output high-level spatio-temporal features can be computed by: where FC(•) is a fully connected layer.flatten(•) is the flatten operation that flattening output feature map of the 2D-CNN encoder.
As for the decoder in the pressure field branch, its structure is symmetrical with the 2D-CNN encoder and the transpose convolution is used to recover the spatio-temporal information from high-level features.It predicts the future m + 1 time steps of the GPH, which is same length of time as the input G t .The loss function of the pressure field branch is computed by: where TGPH i is the target value of the future GPH data.||•|| 1 is the l 1 -norm function.

Dual-Branched Features Fusion Decoder Module
Based on the LSTM-based encoder in TC features branch and the pressure field branch mentioned above, there are three types of intermediate variables from two branches that fed into the LSTM-based decoder module, that is the final time series code of the TC features E TC , the high-level spatiotemporal reanalysis 2D GPH features E GPH and the output features from LSTM encoder layers E 1 and E 2 .In this subsection, we will introduce the LSTM-based decoder module for efficiently fusing the dual-branched multi-modal features and generating multi-horizon TC track forecasting results.Fig. 3 illustrates the proposed LSTM-based decoder module that involves a two-layers stacked LSTM decoder and the subsequent two fully connected layers.The specific decoding and feature fusion procedure is as follows: where the initial state h t and c t are the elements of E 1 and E 2 , which are the final state of the LSTM encoder layers.The initial input of the LSTM decoder Y t is set to zero.The loss function of the LSTM-based decoder is also l 1 -norm function, that is: where, T i is the ground-truth changes of latitude and longitude.With the operation above, the features from both inherent TC features and reanalysis 2D pressure field can be fused effectively and we can achieve the multi-horizon TC track forecasting results Y t based on the fused multi-modal features.

Algorithmic Details
We trained our proposed DBF-Net in a three stages manner.First, we only train the TC features encoder module by adding a fully connected layer to directly predict the target value of the TC track and get the pre-trained LSTM encoder in the TC features branch.Then, we utilize the reanalysis 2D pressure fields GPH data to train the pressure field branch of the DBF-Net and learn the temporal dynamic changing of GPH data.Finally, we add the LSTM decoder module into the training pipeline and train the DBF-Net in an end-to-end manner and the loss function at the final step is as follows: where, the L 2 is the regularization term with l 2 penalty.α and β are hyper-parameter.The training schedule detail will be discussed in Section 3.3.

EXPERIMENT
In this section, we evaluate our proposed DBF-Net on the best TC tracks data in Northwest Pacific (WNP).The good forecasting performance of DBF-Net is verified by the comparison with other deep-learning based and traditional TC track forecast methods.We also analyze the forecast results for several individual cases of TC events and the specific forecasting characteristics of the DBF-Net.

Dataset
Best track dataset (CMA-BST).The inherent features data of TCs is extracted from the Best Track (BST) data released by China Meteorological Administration (CMA) [17].It includes the location and intensity of TCs in the Northwest Pacific (WNP) Ocean (0  [18] .CFSR-GPH is grid data with a spatial resolution of 0.5 • and the temporal resolution is aligned with CMA-BST from 1979 to present.TCs in WNP are mostly genrated at the southern edge of the subtropical high pressure and move along its periphery.Therefore, the 500hPa geopentential height data is chosen as the background pressure field to describe the activity of TCs.Dataset Split.Based on the CMA-BST and CFSR-GPH dataset mentioned above.We choose overlap of the two datasets i.e.TCs from 1979 to 2018.And we only keep the TCs with a life cycle greater than four days to ensure the persistence.There are 940 TCs left in this dataset.We make 17000+ samples for model training, validating and testing based on a sliding window of length input sequence length + prediction length (as shown in Table 3).

Metrics
In order to evaluate the TCs track forecast results, the Mean Distance Error (MDE) is the common metrics to measure the average distance error between model prediction and ground truth.MDE can be computed by: where, R ≈ 6371km represents the radius of earth.ϕ pre and ϕ gt stand for the latitude value of prediction and ground truth.λ pre and λ gt stand for the longitude value of prediction and ground truth.
Besides, the skill score is also the index to evaluate the practical availability of the methods, as follow: where, e A is the prediction error of CLIPER method and e B is the error of proposed method.

Implementation Details
We train our proposed DBF-Net in a three stages manner with the Pytorch framework, which has been discussed in Section 2.5.We use the RMSProp optimizer and set the initial learning rate to 0.001.The batch size of training set is set to 64.The hyper parameter α and β in equation ( 19) is set to 1.2 and 0.00001 respectively.For multi-horizon forecasting (i.e.predicting the TCs track at multiple future time steps), the output prediction sequence length of the DBF-Net is 4 and the input sequence length is 5.That is we predict the 6h, 12h, 18h and 24h TCs tracks based on the historical data from time t − 5 (30h prior) to time t (the current time).We train our DBF-Net on a single NVIDIA GeForce GTX 3090 GPU.

Comparison with Statistical/Deep Learning Forecast Methods
We first compare our proposed DBF-Net with other statistical and deep learning based TCs track forecast methods, including the extrapolation method, CLIPER method, feature fusion network [11] and recent BiGRU-attn [12].The extrapolation is a simple traditional TCs track forecast method.It assumes that the direction and speed of TCs movement do not change much, and predicts based on the movement direction and velocity at previous times.CLIPER can be treated as the benchmark of other track forecast methods.It uses correlation analysis to screen climate persistence factors and constructed multivariate linear regression models.In this paper, we replace the multivariate linear model with a back propagation (BP) neural network model, which enhancing the nonlinear representation of the CLIPER model.We selected 20 factors with strong correlation from 46 climate persistence factors by Pearson correlation analysis and feed them into the BP neural network.As shown in Table 4, our proposed DBF-Net outperforms previous works.Specifically, compared with the benchmark forecast method CLIPER, DBF-Net achieves better MDEs for all forecast time steps.That is the skill score with respect to the CLIPER is positive, which demonstrates the practical availability of our method.In addition, our DBF-Net also ourperforms previous deep learning based methods.Compared with FFN [11] that utilizes the wind, pressure fields simultaneously, our DBF-Net achieves better results only based on pressure field.This also shows that our method can better encode the effective features of the input data.

Comparison with NWP Forecast Methods
We further compare our DBF-Net with the Numerical Weather Prediction (NWP) system that commonly used in operational forecasting.The global pattern T213/T639 and Shanghai typhoon region pattern (SHTP) are chosen for comparison.Compared with our deep learning based method, the NWP methods always need a great number of computation resources and the inference time increases rapidly as input data resolution increases.However, NWP methods still can achieve better forecast accuracy compared with deep learning based methods.As shown in Table 5, DBF-Net could achieve comparable performance compared with global pattern T213/T639, especially in year 2014 and 2015, the 24h MDE of DBF-Net is much better than T213/T639.However, compared with the region pattern SHTP, the forecast of the DBF-Net still has a certain gap.The great performance of the SHTP may due to the high-resolution multi-layer nested grid input data and huge computational resource consumption.In contrast, our proposed DBF-Net achieves relatively high prediction accuracy under the premise of low-resolution input (1 • spatio resolution for GPH data) and small computational resource consumption.At the same time, we believe that by using higher resolution data for model training, our DBF-Net could further improve the forecast accuracy which can be further studied in the future works.with two branches encoder-decoder networks and feature fusion module, which is denoted as "DBF-Net ", achieves the best forecast MDE except with the 6h forecast result in "DBF-Net" that verified the consistency of our method.In addition, the relatively bad performance of LSTM-based and 2D-CNN-based encoder-decoder architecture alone, which is denoted as "TC-features-only" and "Pressure-fields-only", demonstrate that it is hard to obtain good forecast results for single inherent features or meteorological fields input.It makes sense to fuse these two types of data for better forecast accuracy.The results in Table 6 also show the effectiveness of the 2D-CNN Decoder module for enhancing the temporal dynamic changing of GPH data.Once the DBF-Net is trained, the inference is done by just passing TC features branch and 2D-CNN encoder module.Therefore, the 2D-CNN decoder module does not increase the memory and computational cost of the DBF-Net inference.

Case Study
In this subsection, we select three individual cases of TC events, namely Typhoon Trami 1824, Typhoon Hagibis 1919 and Typhoon Fengshen 1925.According to the forecast results, the validity of the DBF-Net is further verified, and the forecast characteristics of DBF-Net are analyzed.Fig. 5 and Table 7∼9 shows the TCs track forecast results for the three cases.In Fig. 5, the blue line represents the ground truth track of TCs and the red line is the forecast results of the proposed DBF-Net.The intersection of the blue and red lines is the location of initial forecast time.The 4 points extending from the red line represent the forecast path in the next 24 hours (6-hour interval) from the initial forecast time.For Typhoon Trami 1824, its path generally shows a trend of first westward and then northward.As shown in Fig. 5(a), the error between the forecast and the ground truth track is relatively small at the inflection point from west to north.This shows that the model itself has learned the potential features of TCs track movement.For Typhoon Hagibis 1919, there also a inflection point at time "100718" (as shown in Fig. 5(b) and Table 8).Although the forecast result is relatively bad (177.07kmfor 24h forecasting), the DBF-Net also could fix the prediction by bringing the observations from the next time step (24.93km for 12h forecasting at time "100806").This shows that the historical information closest to the initial forecast time is more important.For Typhoon Fengshen 1925, its trajectory presents a 180 • turning trend, which is unconventional.As shown in Fig. 5(c), DBF-Net could correctly predict the turning trend of the TC.However, the forecasting length of the track vector is uniformly smaller than the ground truth and causes the average MDE to be relatively large (in Table 9).
In Table 7∼9, we also report the intensity level (INT) at each time step and classify it into 6 categories, namely tropical depression (TD), tropical storm (TS), severe tropical storm (STS), typhoon (TY), severe typhoon (STY) and super severe typhoon (SuperTY).By comparing the results in Tables 7∼9, it can be found that the DBF-Net has a relatively lower MDE when the intensity level is larger.Especially for the TCs with the intensity level of "SuperTC", the predicted MDE is comparable to the NWP model.This phenomenon also explains the relatively poor forecast results of Typhoon Fengshen 1925.

CONCLUSION
In this paper, we explore the way to forecast the TCs track for multiple future time steps by proposing a novel deep learning based model, named DBF-Net, to make full use of both inherent features of TCs and reanalysis 2D pressure fields data and fuse the multi-modal features efficiently.DBF-Net contains two branches with encoder-decoder archi-tectures and can be split into three part.The first part is the LSTM-based TC features encoder module that captures the high-level temporal features from historical inherent features of TCs.The second part is the pressure field branch that extracts the spatio-temporal features by learning the temporal dynamic changing of GPH data with a 2D-CNN-based encoder-decoder network.The last part is a LSTM-based decoder module that fuses the multi-modal high-level features from different branches and produces the multi-horizon prediction.The experiments on the TCs track dataset in the Northwest Pacific Ocean verify the effectiveness of our proposed DBF-Net.

ACKNOWLEDGE
This paper is supported by.

Fig. 1 .
Fig. 1.The overall structure of the proposed DBF-Net for multi-horizon TC track forecast.
Lat t and Lon t are the Latitude and Longitude of TC center at time t.M(•) represent the end-to-end DBF-Net.It should be noted that, we use the relative change in latitude and longitude Y i as the output of DBF-Net instead of the direct location.

Fig. 4 .
Fig. 4. The LSTM-based decoder module in TC Features Branch with dual-branched features fusion.

Fig. 5 .
Fig. 5. Exmaple of TCs track forecast results.The blue line represents the ground truth track.The red line represents the forecast results.

Table 1 .
The

Table 2 .
at sixhour intervals from 1949 to 2018.Examples of the CMA-BST data are shown in Table 2. Examples of CMA-BST data.I stands for the intensity level of TCs.LAT and LON are the latitude and longi-

Table 4 .
Comparison of the TCs track forecasting results of statistical and deep learning methods.

Table 5 .
Comparison of the TCs track forecasting results of NWP methods.The results of NWP methods is released by [xx,xx,xx,xx].

Table 6 .
The impact of different branches in DBF-Net.represents the model with 2D-CNN Decoder in pressure branch to enhance the temporal dynamic changing of GPH data.

Table 7 .
The track forecast result of DBF-Net for Typhoon Trami 1824.INT stands for the intensity level of the TC.AVG stands for the average prediction MDE.

Table 8 .
The track forecast result of DBF-Net for Typhoon Hagibis 1919.INT stands for the intensity level of the TC.AVG stands for the average prediction MDE.

Table 9 .
The track forecast result of DBF-Net for Typhoon Fengshen 1925.INT stands for the intensity level of the TC.AVG stands for the average prediction MDE.