A Generalized Approach to Aircraft Trajectory Prediction via Supervised Deep Learning

As research advances diverse forms and missions of aircraft, the National Airspace System (NAS) will become increasingly crowded, limiting the current aviation spectrum to accommodate future air operations. To tackle this challenge, the concept of intelligent spectrum management is proposed for autonomous and dynamic resource allocations, where accurate aircraft position estimation is one of the crucial tasks. However, current research on flight trajectory prediction has been limited in its scope of efforts, frequently utilizing a unique flight route, architecture, set of weather data, and date range. In this paper, we propose a generalized hybrid-recurrent predictive model for flight trajectory prediction. Our generalizable deep learning approach not only improves trajectory prediction accuracy, but also can be contextualized by exploring a large amount of data. Experimental results illustrate a tradeoff between horizontal and vertical errors as flight data are generalized across dates and routes.


I. INTRODUCTION
Over the next decade, the National Airspace System (NAS) will become increasingly crowded due to the emergence of new vehicles such as supersonic aircraft and unmanned aircraft systems.The increased airspace density will worsen the aviation spectrum scarcity issue and there is a pressing need to develop new approaches for intelligent spectrum management.To this end, the National Aeronautics and Space Administration (NASA) is investigating techniques for the autonomous allocation of the aviation spectrum [1], such that the NAS will assure the desired quality-of-service with improved spectrum utilization efficiency.
Current approaches consider the use of artificial intelligence, specifically deep reinforcement learning solutions, which require the knowledge of the physical environment [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Maurice J. Khabbaz .
In particular, the accurate position estimation of en route flights is needed for automatic and dynamic spectrum allocation.Given the estimated position of the fights, direct inferences can be made toward localizing the aircraft to a particular sector (and associated frequency), as well as estimating the path loss between the aircraft and ground stations.
To ensure flight safety, the air traffic control (ATC) is responsible to regulate and manage all air traffic activities.Before taking off, the trajectory of an aircraft is planned by ATC, and the aircraft would keep flying according to this designated trajectory.However, the trajectory of an en-route flight is subject to environmental conditions.In particular, convective weather is one of the main causes of flight trajectory changes, and ATC would make changes for the en route flight if it runs into convective weather to avoid the negative impact.The trajectory changes could be made in latitude, longitude, and altitude.Thus, the arrival time for an aircraft to a given position also changes.As a result, the trajectory of an aircraft should be estimated in a spacial-temporal space [3].
In recent research, weather information has been utilized for trajectory prediction, but it is still a challenge to find an effective approach that can make use of the weather data and achieve accurate prediction at the same time.Recently, artificial intelligence has been introduced in trajectory prediction, where Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Deep Learning are adopted to estimate the positions of aircraft [4], [5], [6].However, they are limited in frequently utilizing a unique flight route, learning architecture, and exploring specific weather data and data range.
The aforementioned limitations hinder the prediction accuracy and the usage of data.To solve these problems, we propose a generalized approach for aircraft trajectory prediction through deep learning.We design a generalized deep learning model that can improve prediction accuracy.Besides, a large amount of data is explored for the model training and thus to be contextualized, which can still be effective in other related research.This paper revisits the challenge of 4D trajectory prediction (latitude, longitude, altitude, and time) for commercial aircraft, with experiments critically examining the selected data and architectural designs in potential deep learning approaches.
The initial result of this research was presented in [7], which has since been expanded upon.Specifically, four learning models are adopted with the same sets of hyperparameters and trained over 500 epochs.The CNN, Self-Attention (SA), LSTM, and Gated Recurrent Unit (GRU) are important components to formulate the learning models: CNN-LSTM, CNN-GRU, SA-LSTM, and SA-GRU.Each model adopts an Independently Recurrent Neural Network (IndRNN).While SA-GRU was not considered in our previous work, it has been included as a result of each component individually, offering notable performance improvements.The training is conducted with four-fold cross-validation to reduce the potential bias in data and variance from random initialization.After model training is completed, the evaluation of models is considered in two aspects: the performance of the models on flight route variety generalization, and the performance of the models specifically on flight route period generalization.
The main contributions of this complete set of research include: • A comprehensive comparison of the efficacy of weather products used in literature, and combinations thereof, for trajectory prediction via a hybrid-recurrent neural network.These results indicate the presence of bias when training over a single flight route, while affirming Echo Top (ET) feature as the best-suited holistic weather product for trajectory prediction.
• A thorough comparison of current deep learning mechanisms that are intuitively suited to the challenges of trajectory prediction.Specifically, the use of CNN, SA, LSTM, and GRU are considered.These results indicate the potential for improved accuracy using SA layers, while LSTM may not be suited for the complexity of decision-making in this task.
• The generalized dataset consisting of 21-route (10, 716 flights) over a time period of about 100 days.
The experimental results show significant performance improvement, demonstrating the need for our generalizable architectures and the generalized dataset.
The rest of the paper is organized as follows.Section II presents a complete survey of related work.Section III introduces the data collection and pre-processing of generalized flight data and weather data.Section IV describes the trajectory prediction procedure and our deep learning network architecture.Section V discusses the experiment results, including hybrid-recurrent network parameters, performance evaluation, and comparison, and data input sufficiency discussion.Finally, the conclusion and future work are summarized in Section VI.

II. RELATED WORK
In this section, we summarize recent studies on trajectory prediction.First, the prediction accuracy is affected by the prediction approaches.Advanced methods can provide accurate trajectory prediction [6], [8], [9].Besides, training data sources are also important for trajectory prediction accuracy.Specifically, there are two types of training data: flight datasets and weather datasets.The flight trajectory is planned before taking off but may change due to convective weather.Weather forecasting information is essential for predicting the real-time trajectory of inflight aircraft in en-route airspace.Thus, the trajectory prediction accuracy increases with the most detailed and accurate weather forecasting information [5], [10].
To achieve a high-accuracy 4D trajectory estimation, Tang et al. proposed a method for extracting the nominal flight data and revising airspace meteorological forecasts [11].A Dynamic Space Warping (DSW) algorithm is applied to measure the distance between two flight altitude profiles to extract flight intention from historical Aircraft Meteorological Data Relay (AMDAR) data.Cressaman interpolation is utilized to revise the original forecast from GRIdded Binary (GRIB) data.Zhou et al. proposed a trajectory prediction approach that combines the aircraft motion model and the prediction method of grey theory [12].It realizes the real-time online trajectory prediction and improves the prediction accuracy.
However, traditional trajectory prediction methods cannot meet the requirements of high-precision, multi-dimensional and real-time prediction.Recently, machine learning algorithms have been developed and applied in multiple areas, such as pattern recognition [13], Natural Language Processing (NLP) [14], speech recognition [15], etc. Inspired by that, learning-based solutions are proposed and applied in trajectory prediction to improve performance [16], [17], [18].Multiple networks are adopted for prediction including Gaussian Mixture Models (GMM), Hidden Markov 116184 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In [5], the author developed an HMM-based approach that utilizes correlated historic flight and surrounding weather data to predict the flight trajectory.Specifically, the National Oceanic and Atmospheric Administration (NOAA) Rapid Refresh (RAP) dataset records the weather data, where the weather features associated with the flight points are extracted.Using the defined trajectory as observed emissions and regions of RAP coordinates as hidden states, the HMM is trained.A total of 594 4D trajectories are collected for one flight (DAL2173) with identical arrival and departure locations (ATL to MIA).Ayhan et al. formulated a complete trajectory prediction model and proposed the notion of surrounding weather data (feature cubes), a concept that has widely defined later model developments.However, the efficacy of this approach has yet to be matched.It is unclear if the reported accuracy is a result of the learning approach or the heavy constraints placed on flight data collection.
A deep generative network is proposed by Liu and Hansen for trajectory prediction [10].This framework generates GMM using a sequence-to-sequence paradigm for LSTM.The predictions of these models are then filtered using a variety of techniques, including Adaptive Kalman Filter, Beam Search, and Rauch-Tung-Striebel Smoother.3D Flight plans and 4D flight trajectories are recorded for 1, 679 flights with identical arrival and departure airports (IAH to BOS).Weather data are collected from the NOAA North American Mesoscale (NAM) database, specifically Westerly Wind Speed (U Wind) and Southerly Wind Speed (V Wind), Air Temperature (TMP), and Convective Weather.However, the performance is inferior to the performance in paper [5].Several reasons for these poorer results may be inferred: (i) the NAM database is limited in resolution, as each data point is inter-spaced at 12 km and refreshed every 6 hour; (ii) the selected flight is infrequent, and collecting the number of flights in this paper may have required a significant range of seasons and consequent weather patterns; (iii) the model itself may have been unnecessarily complicated by relying on the repeated generation and sampling of GMMs.
The back-propagation (BP) neural network is designed to formulate a 4D trajectory prediction model [19].The hierarchical clustering algorithm and the k-means clustering algorithm are adopted to analyze the total flight time.The cubic spline interpolation scheme is used to interpolate the flight position to extract the main trajectory feature.
A more robust learning approach is formulated in [20] and [21].Taking inspiration from the previous papers, Pang et al. proposed a convolutional-LSTM hybrid network to predict aircraft trajectories [20].This model presents a basis for hybrid-recurrent networks: weather features are extracted and represented through a series of convolutional and dense layers, while supplemented with the aircraft location prior to the provided cube.The combination of abstracted weather data and prior aircraft position is fed into an LSTM layer to predict the aircraft's position.LSTM layers are selected for recurrence to mitigate the vanishing gradient challenge of training traditional Recurrent Neural Networks (RNNs).Feature cubes are generated from ET measurements, and flight data is collected for a total of 2, 528 flights of identical arrival and departure points (JFK to LAX) over the dates November 1st, 2018 through February 5th, 2019.Initial research focused on 3D trajectory predictions (ignoring altitude), and reported efficacy in terms of improved error (described by Euclidean Norms) over that of the flight plan.Pang et al. reported the accuracy of 47% of all flight plans is improved by their predictive model, on average by 12.3%.While this efficacy appears satisfying, this relative metric is not directly comparable to other research.
Pang et al. expanded their research and considered 4D trajectory prediction in [21], reporting error in standard units for the task (degrees latitude, longitude, and feet altitude).Horizontal errors generally appeared to be within 1 degree of latitude and longitude, approximately 60 nautical miles (nmi), and 100 feet (ft) of elevation.The two papers prior reported efficacy in terms of horizontal and vertical errors (units of nmi and ft), with no reference to the error of related flight plans.As a result, this paper should be more critically contextualized in other research.
Ma and Tian discuss the prediction of aircraft based solely on prior Automatic Dependent Surveillance-Broadcast (ADS-B) data [6].The research considers single and multi-point forecasting of the 4D trajectory using sequences of prior 4D data, as well as ground speed and heading information.Three models are presented for this task, one convolutional (CNN), one recurrent (LSTM), and one CNN-LSTM hybrid.The results reinforce the usefulness and importance of prior design choices in hybrid-recurrent networks, while also offering some qualitative understandings and intuition.Data are collected for approximately 397,000 flights from Qingdao to Beijing, providing the largest dataset of all considered in existing research.However, again, the metrics reported in this paper are not directly comparable to those described in prior research; those presented here are standard error metrics within deep learning (Mean Absolute Percentage Error, Mean-Squared Error, etc.), but not relevant to describe the usefulness of a trajectory prediction model.Despite the progress in these studies, their proposed models and datasets are limited to a specific scenario, causing potential bias of model training toward a specific route and date range.A notable gap from recent studies, when comparing weather products and flight data, is the potential bias of model training toward a specific route.In these studies, their proposed architectures may run into memory constraints causing large errors as a result of dealing with unknown routes.Based on recent studies and our previously published paper [7], we propose a generalized model and a generalized dataset to improve trajectory prediction accuracy.

III. DATA COLLECTION AND PREPROCESSING A. GENERALIZED FLIGHT DATA
Within the continental United States (CONUS), flight plans and flight trajectory information are collected via the Federal Aviation Administration (FAA) Air Route Traffic Control Center (ARTCC), while researchers typically access these data in aggregated locations such as the NASA Sherlock Data Warehouse [22].The flight track provides complete 4D information, flight plan messages only contain a cruising altitude and string of waypoints guiding the aircraft en route.Flight plan messages are only provided as communications occur to modify these items.As a result, the last-filed flight plan (before departure) must be interpreted by selecting initial messages and cross-referencing databases such as OpenNav.For prediction, the 4D coordinates (latitude, longitude, altitude, and time) are directly collected from flight trajectory information, and inferred from flight plan information based on the field waypoints and cruising altitude.
We create a generalized set of training/validation dataset.Twenty-one routes of about 100-day (November 1st, 2018 to February 5th, 2019) period are selected for training and validation, as summarized in Table 1. Figure 1 illustrates an example of the training and validation flight routes.Routes are selected to vary the general heading, duration, and coverage of flights within the continental United States.Several routes are also selected to match with those used in prior research.Finally, the flights listed consist only of those whose duration is no more than 30 minutes shorter than the listed nonstop flight duration of the reported route.This is done to prevent the use of incomplete flight information.

B. WEATHER DATA
Table 2 summarizes the weather data sources and popular products.Each source provides gridded data at a regular update interval, offering sufficient weather information to support flight prediction.In order to improve forecasting accuracy, weather data is collected based on their current (actual) measurement.The weather dataset includes Corridor Integrated Weather Service (CIWS), North American Mesoscale (NAM), and NOAA's High-Resolution Rapid Refresh (HRRR) and Rapid Refresh (RAP).
In this paper, we analyze several candidate weather data to find out the most efficient information for trajectory prediction: (i) ET and Vertically Integrated Liquid (VIL) features generated by CIWS, (ii) the atmospheric TMP and U/V Wind from NOAA HRRR.We conduct correlation analysis of products of interest and supervised training of an identical neural network using varied weather products.
At first, a limited cross-correlation of weather products is performed and the results determine which combination of weather products supplements one another sufficiently to justify the computational cost of providing additional data to the models.Therefore, the individual weather products and selected combinations are selected for the prediction.Figure 2 shows the histograms of the cross-correlation coefficient in a range of (0, 0.5).For ET vs. other products and VIL vs. other products, the majority of coefficients have a magnitude value of less than 10 −3 .This non-correlation is due to the sparsity of both products, as nonzero measurements only appear for rare weather occurrences (i.e., strong  convective weather and precipitation).Thus, both products can provide significantly unique information to any other considered product.By contrast, TMP and U/V Wind are all moderately correlated with one another, with average coefficients for each resting just above 0.25.As a result, combinations predominantly considered only one of the three HRRR products.Therefore, the model training adopts several combinations of weather datasets.
Secondly, the total set of weather products and product combinations are trained and evaluated, and the results are summarized in Table 3.The horizontal error e h and vertical error e v statistics are given, where µ h (resp.µ v ) and σ h (resp.σ v ) represent the mean and standard deviation of e h (resp.e v ).Combinations of products are accomplished by incorporating an additional dimension to the convolutional layers of the deep learning network.These results include percent improvements drawn against ET, which is treated as the default due to its use in [21].The observations are summarized as follows: • Without considering combinations of multiple products, ET provides the lowest horizontal error.This is expected, as the measurement is both sparse and a useful holistic approximation of weather severity.
• VIL performs notably worse than most products, despite its sparsity and correlation to convective weather.This reflects how VIL represents only the presence of liquid, not whether it is indicative of humidity, precipitation, type of precipitation, etc.
• As no NOAA data product could provide improved horizontal accuracy, the use of any of the three provided degrees of improvement to vertical accuracy predictions is likely because of data being altitude-varying.
• Notably, V Wind provided the greatest improvement in vertical accuracy.It is believed this reflects a skew in-flight data, where a notable percent of arrivals and departures occurred with a significant North/South (N/S) Wind Speed (parallel to V Wind), while the en route flight had a majority of East/West (E/W) Wind Speed (parallel to U Wind).No combinations of products could provide sufficient improvements in accuracy to justify additional data processing and model complexity.In particular, prior experiments without normalizing TMP data yielded horizontal accuracies much closer to those of ET, making TMP a viable choice for predictive modeling.ET data is one of the useful products that can result in high prediction accuracy.In this paper, ET data is adopted in the preprocessing and generation of feature cubes as the model input.

IV. TRAJECTORY PREDICTION NETWORK ARCHITECTURE
In this paper, trajectory prediction is treated as a sequenceto-sequence translation problem.A complete sequence of flight plan information and feature cubes are fed into the hybrid-recurrent network to predict the complete flight trajectory.
After the flight data and weather data collection described in Section II, the 3D feature cubes are generated by Algorithm 1. Figure 3 illustrates that feature cubes are generated along each point of the flight plans.Weather data is collected one level above and below the current altitude.A 20 × 20 × 3 feature cube can be generated for each flight plan point.All data are linearly interpolated at a one-minute interval.It can restrict the data to a useful format for recurrent neural networks, where entries are regularly dispersed and the total sequence length is within the experimental memory constraints of LSTM/GRU [23].
Figure 4 illustrates the general architecture of the network for trajectory prediction.Let t represent the time step, Cube t denote the feature cube at time step t and X t denote the anticipated position information from the flight plan.The input of the hybrid-recurrent network is a sequence of feature cubes, which is simplified as Move the current perpendicular surface to the next point which is parallel to the previous surface with the distance of stepsize;   a recurrent network, where the LSTM, GRU, and IndRNN are applied in this paper.The output of the recurrent network is treated as an estimate ( Ŷ ) of the actual trajectory (Y ), which can be used to estimate loss and update model parameters.
Several crucial network components have the potential to improve the hybrid-recurrent framework, including GRU, IndRNNs, and SA mechanisms.GRU is similar to LSTM, which is one of the techniques in RNN.It mitigates the issue of vanishing gradients with the addition of gating mechanisms to regulate input and retain information.However, their implementation accomplishes this task with fewer parameters and hidden states than LSTM, theoretically allowing for faster training and improved performance.The two cells perform similarly on a variety of tasks, often without optimal choice [24].
IndRNNs is another technique for RNN, which resolves the vanishing gradient problem without the addition of gating mechanisms, resulting in a network with fewer parameters and potentially better memory [23].In typical recurrent networks (as represented in Figure 5), the hidden states associated with a recurrent layer are connected to each cell within the same layer; by contrast, IndRNN proposes a restriction on the connections, such that each cell in a layer is connected to only its own hidden state, as shown in Figure 5.This approach prevents hidden states from being harshly penalized and neglected, solving the vanishing gradient problem so long as a sufficiently small optimizer learning rate is selected.
The attention mechanism is proposed for improving the performance in the natural language process, which has been recently studied in other fields, such as computer vision.Findings indicate that SA is a capable supplement following convolutional layers when extracting patterns from 2D data [25].Figure 6 describes the structure of the soft SA layer.The input feature and the output of the SA layer are denoted by X and O.The key, query, and value are denoted by K , Q, and V , respectively.The SA layer calculations are given by equation ( 1)-( 5) [26].
A complete data sequence is received and transformed into key, query, and value datasets.In the SA layer, a sense of locality is embedded in the data via the softmax layer, whose input is a matrix multiplication of transformed key and query datasets.

V. EXPERIMENTS
The hybrid-recurrent framework for trajectory prediction is intentionally broad to allow for both its data inputs and learning models to be evaluated for the general challenge.
Compared to [7] where flight and weather data are limited to a single route (KJFK-KLAX) over a 2-week period, in this paper we improve the variety and size of the flight dataset that consists of 21 flight routes over a 100-day period, as shown in Table 1.For all experiments, the setup consists of: • A workstation using a Ryzen 1950X processor and dual Nvidia RTX2080 graphics cards, with Ubuntu 20.04 LTS serving as its operating system.
• Data preprocessing and deep learning models as python projects in [27] and [28].
• Flight and ET data collection are accomplished via access to NASA's Sherlock Data Warehouse, which maintains a repository of data for air traffic management [22].
• Additional weather data are collected directly from NOAA's HRRR cloud portals or in coordination with Massachusetts Institute of Technology Lincoln Labs Corridor via their CIWS [29].

A. HYBRID-RECURRENT NETWORK PARAMETERS
Initially, three hyperparameters related to overfitting are adjusted, including the dropout rate p, weight regularization L2 penalty (referred to as weight decay in PyTorch libraries), and the use of batch normalization.Each combination of parameters is tested in isolation with a CNN-LSTM architecture following parameters in Table 4, training over 200 epochs.While the use of batch normalization between each layer could be discretely tested, regularization penalties and dropout rates required a specified set of rates to test.Specifically, dropout rates include 0%, 0.01%, 0.1%, 1%, 5%, 10%, and 20%.Regularization penalties include 0, 10 −8 , 10 −6 , 10 −5 , 10 −4 , 10 −3 , 10 −2 , and 10 −1 .It should be noted that dropout layers are selectively incorporated into the model, specifically before the final extraction layer, the first recurrent layer, and after each subsequent recurrent layer, as in Figure 4. Results from this portion of testing indicated the importance of the minimal incorporation of dropout and regularization (with selections of 0.01% and 10 −8 ), while batch normalization wholly impaired model training.
Figure 4 describes the general form of hybrid-recurrent network architecture.There are several different weather features and network components to define the hybrid-recurrent network variants.Weather features and network components are separately stated in Section III.The layout and parameters of the hybrid-recurrent network are summarized in Table 4. Extraction mechanisms are defined in a depth of three layers, and include a purely convolutional design, a purely SA design, and a convolutional design with SA serving as a final layer.Recurrence techniques included LSTM, GRU, and IndRNN layers, always with a hidden state size of 100.Finally, the depth of recurrence is limited to 1 or 2 layers.For IndRNN cells, an additional layer is included to allow for information sharing between neurons.
Weather extraction hyperparameters strictly considered the size of feature retention in the hidden (k 0 ) and output (k 1 ) extraction layers, which are tested on both CNN-LSTM and SA-LSTM models with recurrent input sizes of both 6 and 10.It is assumed these parameters meaningfully transfer to GRU recurrent layers.While this size parameter can be directly treated as the number of filters in convolutional layers, attention layers treat this parameter analogously.Instead, the size represents a multiple of the total output features of the layer-an attempt to retain an equivalent number of features in each model.For each model, 200 sample models are constructed with layer sizes randomly selected between 1 and 32; each sample is trained over 50 epochs before all samples are compared for common trends.Based on this test, convolutional models performed best with larger hidden and output sizes, while attention models benefit from large hidden sizes but significantly reduced output sizes; all models performed better with the smaller recurrent input size of 6. Convolution sizes of (28,22) and attention sizes of (31, 8) are selected.
Expansion of the recurrent portion of the model came under consideration to increase the number of features and potential of compression/retention of flight behaviors and corresponding changes in trajectory.Recurrent hyperparameters considered both a total recurrent depth (n) and 116190 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the hidden size of each layer (h).For this experiment, each set of hyperparameters must be tested individually for CNN-LSTM, CNN-GRU, SA-LSTM, and SA-GRU models, as each model is likely to have learned different classes of weather features and each recurrent type requires different parameters to retain said features.For completeness, a grid search is conducted: recurrent depths of up to 4 layers are considered, while hidden sizes vary up to 1000 cells in increments of 50.Because recurrent networks require the largest amount of hardware resources (and therefore the longest training time), the experiment is limited to 20 epochs.Based on the results of this experiment, both attention models are found to require an additional recurrent depth (2) for optimal performance, with a notably larger hidden size (600).Neither convolution model significantly benefited from the added recurrent depth (and consequent computational cost), though still notably improved from increased hidden size (LSTM: 1000, GRU: 650).
Finally, the selection of an appropriate optimization approach is considered.Optimizer choice played a particularly important role due to concerns of generalization; research so far has relied heavily on adaptive optimizers, which have been found to poorly generalize models for some tasks [30].
We conducted two experiments to find out the best combination of optimizers and their schedule.For the first experiment, a total of 8 optimizer options are considered for CNN-LSTM models, assuming transferability.These optimizers included Stochastic Gradient Descent (SGD), SGD with momentum, SGD with Nesterov momentum updating, Adam, Adam with default parameters (no weight regularization), Adadelta, Adagrad, and RMSProp.With the exception of each SGD variant, each optimizer utilizes its default learning rate in the PyTorch library.For each SGD variant, the optimal learning rate and momentum (if applicable) are determined by grid search in a similar process as the prior subsection.A CNN-LSTM model is trained over 250 epochs with 5 random initializations.
The results of this initial attempt are illustrated in Figure 7, with the average of each model's training sessions plotted.While it is clear that SGD variants will not achieve the same degree of accuracy with enough brevity to be sufficient for this problem, several challenges came with the final training of models, particularly ensuring the convergence of GRU models.
For the second experiment, the optimizer schedule is considered a simple learning rate decay.Three decay rates (0.1, 0.5, 0.9) and step sizes (10, 30, 50 epochs) are while each optimizer (SGD, Adam, RMSProp) is initialized with a learning rate of 0.01.Each model is initialized randomly 3 times, and the average final training and validation losses are compared.

B. PERFORMANCE EVALUATION OF GENERALIZED DATA AND LEARNING MODELS
For thorough performance evaluation and comparison, we consider the following three cases: (i) the total dataset (21 flight routes over a 100-day period) for both model training and cross-validation; (ii) the total dataset for training and the benchmark single route KJFK-KLAX for validation; (iii) the KJFK-KLAX dataset for both model training and validation [7].We utilize four learning models discussed in Section IV, including CNN-LSTM, CNN-GRU, SA-LSTM, and SA-GRU.As an example, Figure 8 illustrates our flight trajectory prediction.Specifically, the learning models take the flight plan (purple line) and weather features as to predict the flight trajectory (green line).The horizontal and vertical errors are calculated by measuring the difference between the actual (red line) and predicted routes.The results are provided in Table 7 in terms of horizontal error e h and vertical error e v , where µ h (resp.µ v ) and σ h (resp.σ v ) represent the mean and standard deviation of e h (resp.e v ).For case 1, we adopt fourfold cross-validation and average the validation results.From Table 7, our observations are summarized as follows: • Comparing cases 2 and 3, there is an obvious tradeoff between horizontal and vertical errors.As the training data is generalized from the single route to the diverse 21 routes, the mean of the horizontal error decreases at the cost of an increase in the mean of the vertical error.Considering generality is usually achieved at the expense of performance, the horizontal error reduction is noteworthy.On the other hand, since the benchmark route KJFK-KLAX is one of the longest routes within the CONUS, generalizing the set of flight routes will incorporate shorter routes, where less portion of the flight duration is represented by its en route phase with relatively constant cruising altitude.As a result, significant vertical error is likely tied to the climb and descent phases, which may not be predictable using weather information.
• The cross-validation results in case 1 are similar to (marginally better than) the results in case 2 in terms of horizontal and vertical prediction accuracy, which further corroborates the effectiveness of our generalized approach.It must be emphasized that, in our autonomous spectrum study [1], the horizontal trajectory of an aircraft is much more important than its altitude because the sector association of a flight is usually determined by its horizontal coordinates.As a result, compared with traditional flight trajectory predictions that require model training for each specific flight route, our generalized approach not only enjoys the benefit of one-time training for all flight trajectory predictions but also improves the prediction accuracy substantially.
• For our generalized approach (cases 1 & 2), the four models are considered to approach similar error statistics.For case 3, the most apparently promising model SA-LSTM was discussed in [7].However, the SA-LSTM model requires significantly more computational time to conduct an effective trajectory prediction compared to other models.Meanwhile, SA-GRU presents the best performance in cases 1 and 2. This suggests that SA-GRU can learn a more general solution, which could better support predictions along new routes.Compared with previous studies, our successfully trained learning models can considerably improve flight trajectory prediction performance.For example, in [10] the authors studied the trajectory prediction for the flight route from Houston (IAH) to Boston (BOS), which is one of the longest and least-frequented routes in our generalized dataset, accounting for only 0.3% of all flights in our generalized dataset.Compared to the results in [10], our SA-GRU model with generalized training dataset can reduce the average horizontal and vertical errors by 35.08% and 15.28%, respectively.On the other hand, aircraft trajectory prediction for a short flight route was studied in [5], where data were collected for a specific flight (DAL2173) from Atlanta (ATL) to Miami (MIA), a route shorter than the average length of our generalized flight routes.Generally, a large vertical error variance is expected for short flight routes due to the lack of predictability in the takeoff and landing phases.Even though the ATL-MIA route is not included in our generalized dataset, compared to the results in [5], our generalized approach with SA-GRU model reduces the standard deviation of the vertical prediction error by 52.46%.These results clearly show the performance improvements as a result of data generalization and model tuning.

C. DATA INPUT SUFFICIENCY
One observation in trajectory prediction efforts is the increase in vertical error due to data generalization.This is particularly true for flight routes with short duration, where the takeoff and landing phases take a significant portion of the entire flight.
The original flight plans are processed solely based on a linear interpolation using available cruising altitude, navigation aids, and waypoints from the Sherlock Data Warehouse.This provides a sufficient approximation of horizontal position to collect an appropriate set of feature cubes.However, it may have a negative influence on altitude prediction.Figure 9 gives an example of KIAH-KBOS altitude and climb rate, which is one of the routes in our generalized data.The climb rates would often be instantaneous, contiguous, and unrealistic for aircraft behavior.By comparison, most flights would approach a cruising altitude gradually over time (20-30 minutes), requiring one or multiple altitude changes from the apparent cruising altitude, especially during descent phases.To illustrate this point, a collection of flight trajectory altitudes and their approximate climb rates are provided in Figure 10.
In order to handle air traffic congestion near airports, it is common for aircraft to be held above or below their field altitude during takeoff and landing.This altitude holds cannot be predicted from weather conditions alone, and would require additional NAS information to accurately estimate aircraft altitudes close to airports and other centers.To this end, the weather-based 4D trajectory prediction is not effective in predicting altitude during the climb and descent phases.Additional work may identify supplemental data, such as traffic density, to improve this prediction, but it is likely that a different paradigm is needed during these phases.

VI. CONCLUSION AND FUTURE WORK
This paper studied the impact of generalized data and deep learning models for the task of flight trajectory prediction.Our research includes input data generalization & pre-processing, hybrid-recurrent network model tuning, experimental evaluation, and performance comparison with existing methods.Numerical results show that our generalized approach can significantly reduce the horizontal prediction error, which is the primary concern of our autonomous and dynamic spectrum allocation research.On the other hand, reducing vertical error remains a challenge due to insufficient data for recognizing altitude change during ascent and descent flight phases.Therefore, future research is supposed to consider additional modalities of airspace data for sufficient trajectory prediction.

FIGURE 1 .
FIGURE 1.An example of training/validation flight routes.

FIGUREAlgorithm 1
FIGURE Visualization of collecting a single feature cube and a set of feature cubes along a filed flight plan.Algorithm 1 Weather Cube Generation 1: Input: FlightPlan, Weather Dataset, CubicSize 2: Output: Weather feature (e.g., Cube t ) 3: Linearly interpolate the flight plan points and weather data at a one-minute interval; 4: for each point in the FlightPlan do 5: Find the flight direction;

1 ,
Cube t , Cube t+1 in Figure4.Feature cubes are fed into a sequence of extraction layers (convolution or SA) and two dense layers, which output the extracted features.The anticipated positions from the flight plans (X t−1 , X t , X t+1 ) are concatenated with the extracted features, and then fed into

FIGURE 5 .
FIGURE 5.The structure of traditional RNN and IndRNN.

FIGURE 6 .
FIGURE 6. Functional diagram of SA layer.

FIGURE 7 .
FIGURE 7. Training and validation losses of optimizers with default parameters for CNN-LSTM model.

FIGURE 8 .
FIGURE 8.An example of the trajectory prediction.

FIGURE 9 .
FIGURE 9.The altitudes and climb rate of original flight plans (KIAH-KBOS).

TABLE 2 .
Summary of weather datasets.

TABLE 3 .
Summary of weather products performances.

TABLE 5 .
Default parameters of tested optimizers.

TABLE 6 .
Selected hyperparameters for each model.

TABLE 7 .
Summary of data generalization model performance.