Chebyshev Transform-Based Robust Trajectory Prediction Using Recurrent Neural Network

Trajectory prediction is gaining attention as a form of situational awareness because it is an essential component of the support system of autonomous driving, particularly in urban areas. A promising application is cooperative driving automation, where the traffic scene is monitored by roadside sensors with undisrupted views. A critical problem is that these sensors are adversely affected by inclement weather, including drenching rain or large amounts of snow, in which case the reliability of the prediction results can be significantly compromised. To address these problems, this study proposes a framework for robust vehicle-trajectory predictions based on the Chebyshev transform. In the proposed framework, the original trajectory snippets (partial trajectories) are Chebyshev-transformed, and the resulting coefficients form new snippets. The LSTM (long-short term memory) encoder-decoder structure was trained and tested using these new coefficient snippets, which were extracted from a public vehicle trajectory dataset. The performance and robustness of the proposed framework were verified by emulating sensor data that were incomplete as a result of environmental factors. The proposed framework provides stable and accurate long-term trajectory prediction because the Chebyshev transform is robust to incomplete sensor data by virtue of its uniform nature.


I. INTRODUCTION
Autonomous driving is being raised to the next level by technical progress in the relevant software and hardware. However, the barriers are also becoming higher as the operational domain extends to urban areas. Removing these barriers is the key to advancing autonomous driving. One of these barriers is knowledge of the future trajectories of surrounding vehicles. In particular, it is necessary to exactly predict the future trajectories of non-autonomous vehicles because mixed traffic, consisting of both autonomous and The associate editor coordinating the review of this manuscript and approving it for publication was Gerardo Flores . non-autonomous vehicles, can be expected to coexist for the foreseeable future until society becomes fully autonomous.
Trajectory prediction can be performed using information received from either the on-board sensors installed in autonomous vehicles or from roadside sensors for cooperative driving automation. Among them, trajectory prediction with the aid of roadside sensors is currently attracting considerable attention because roadside sensors usually have omniscient views; thus, the traffic scene can be monitored with less obstruction. Previous studies [1], [2], [3], [4] investigated this possibility in the next stage of autonomous driving.
The problem with traffic scene monitoring based on roadside sensors is that these sensors are continually exposed to VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the environment. Thus, their reliable operation is negatively affected by adverse weather, such as heavy snowfalls, torrential rain, and strong wind, with camera sensors being particularly vulnerable to these conditions. Objects that should have been detected may be obstructed or may even remain undetected owing to raindrops or frost that covers the lens of sensors, as discussed in [5]. Accordingly, trajectory prediction would be negatively influenced by these conditions, degrading the reliability of autonomous driving.
Research on vehicle trajectory prediction has been gaining momentum, as this technology is required for a higher level of autonomous driving, and the widespread use of neural network-based approaches is now accelerating this trend. Altché and De la Fortelle [6] applied a long shortterm memory (LSTM) network to vehicle trajectory prediction for a highway situation and validated the performance of the proposed model with the public NGSIM [7] dataset. Messaoud et al. [8] and Luo et al. [9] proposed an attention mechanism that combined different networks for trajectory prediction. Zyner et al. [10] proposed a mixture density network (MDN)-based framework for multimodal prediction in roundabouts. Ding and Shen [11] supplemented context information such as information about the construction site and speed regulation for elaborating the prediction accuracy. Their suggested model consisted of two levels of networks: The upper level classifies the driving policy, such as proceeding straight ahead, yielding, or turning, and the lower level generates the trajectory prediction based on an optimization method. The method proposed by Raipuria et al. [2] was also based on an LSTM encoder-decoder structure [12], and utilized a geometrical curvilinear coordinate system in which the road curvature was featured to elaborate the prediction accuracy. Similarly, Yu et al. [13] considered road geometries to improve the prediction accuracy in various road environments. Jiang et al. [14] focused on the temporal accuracy of trajectory prediction. Other researchers [15], [16], [17], [18] considered the surrounding nearby vehicles for an improved understanding of future vehicle trajectories. Deo and Trivedi [15] suggested a maneuver classification and trajectory prediction model using intervehicle interactions with an LSTM network. In addition, SCALE-NET, proposed by Jeon et al. [16], used an edge-enhanced graph convolutional neural network (EGCN) and LSTM, which features efficient computational performance regardless of the number of vehicles in the region of interest. Inter-vehicular interaction was applied to conflicting vehicles at an intersection [17]. The proposed model generates multiple hypothesis trajectories based on maneuver reasoning results. Deo and Trivedi [18] proposed a framework for vehicle trajectory prediction based on social-LSTM, which has been widely used for pedestrian trajectory prediction [19], [20]. Bock et al. [3] proposed an LSTM-based self-learning trajectory prediction framework with additional data from a new measurement.
These vehicle trajectory prediction methods can be categorized into two groups based on traffic scene monitoring. In [2], [3], and [16], multiple roadside sensors were installed  on a roadside unit, and trajectory predictions were provided to support cooperative driving automation. In [9] and [15], on the contrary, trajectory prediction was conducted on the side of the autonomous vehicle with on-board sensors and the predicted information was utilized in the decision process for autonomous driving. In addition to neural-network-based methods, various filter-based or stochastic methods were proposed [4], [21], [22], [23].
However, previous studies on trajectory prediction have typically focused on the improvement of the prediction performance, including accuracy and multimodality, as discussed in the literature survey, and problems that arise because of sensor degradation have not yet been addressed, although, practically, it is a critical issue. As discussed later, the conventional approach is highly vulnerable to incomplete sensor data. In this regard, a comprehensive discussion on sensor degradation issues is required when attempting to solve the trajectory prediction problem, considering that our study aims to provide robust long-term vehicle trajectory prediction, even with incomplete sensor data. Our findings are expected to make an important contribution to the realization of cooperative driving automation at a higher level. Table 1 presents a comparative table of the related studies and summarizes the model types and objectives of each study.
This study was motivated by the work of Wiest et al. [24] and was based on the Chebyshev transform [25], which transforms time-sequential physical data to coefficients for Chebyshev polynomial fitting. In the proposed framework, the original trajectory snippets, that is, partial trajectories, are Chebyshev-transformed, and the resulting Chebyshev coefficients form the new coefficient snippets, which are referred to as Chebyshev coefficient snippets (CCSs), for training and prediction. A strong advantage of the Chebyshev transform is that the resulting coefficients are uniform for a variable number of sequences in a fixed time interval [26]. Fig. 1 illustrates this aspect and suggests the possibility that the prediction results could be robust to partially incomplete sensor data. Another strong advantage is that the predictions can be in the form of a coefficient set, which is referred to as a snapshot, rather than the individual predicted positions upon which the predictions are made in the form of future positions. This aspect could be beneficial in vehicle-to-everything (V2X) communication applications with a limited bandwidth. For example, the only information transmitted via communication is a few Chebyshev coefficients rather than numerous time-sequential position values. The efficacy of this aspect increases as the number of vehicles for which the trajectory should be predicted increases.
The LSTM encoder-decoder was selected as a baseline structure because the family of recurrent neural networks (RNN) was shown to deliver good performance in the field of trajectory prediction as presented in the literature survey.
The model was trained and tested with Chebyshev coefficient snippets in the proposed framework, and the prediction results were compared with those of the conventional approach, which is trained with time-sequential physical values in a conventional manner. The main contributions of this study are summarized as follows.
• A trajectory prediction framework based on a neural network, which is robust to partially incomplete data, is proposed. The proposed framework integrates the special feature engineering functionality of the Chebyshev transform that encodes the original trajectory snippets to CCSs. The experimental results, produced with data from the public vehicle trajectory dataset inD [27], verified the robust trajectory prediction performance, even with incomplete sensor data. This achievement is crucial for the realization of cooperative driving automation even under harsh conditions such as during adverse weather.
• A novel method based on feature engineering is proposed for trajectory prediction in the form of a snapshot, rather than in the form of a series of time-sequential values. The last output sequence is configured to be the set of Chebyshev coefficients that encapsulates a predicted trajectory of the entire prediction horizon in the proposed method. In other words, the only information transmitted via the communication line is a few coefficient values instead of a large number of timesequential values.
• The architecture of the model can be simplified because the number of sequences for the RNN can be reduced and additional processes such as sensor data imputation are not required, and these aspects might further reduce the learning time. This study focuses on robustness to the sensor degradation issue and does not consider interaction between traffic occupants or other contextual information. The remainder of this paper is organized as follows. Section II summarizes the basics of the Chebyshev transform, and Section III introduces the proposed trajectory prediction framework, which is based on the Chebyshev transform and LSTM encoder-decoder structure [12]. This section also presents a short description of the LSTM encoder-decoder structure and describes the way in which feature engineering is used to create a snapshot of the predicted trajectory for the entire prediction horizon. In Section IV, the proposed framework is verified with the inD [27] dataset by injecting sensor data faults in various patterns. Section V concludes the paper.

II. CHEBYSHEV TRANSFORM
The proposed trajectory prediction framework is based on the Chebyshev transform, from which the objective function is approximated. This approximation starts from the definition of the Chebyshev polynomial, which is defined by the following trigonometric function [25]: where v denotes the degree of the Chebyshev polynomial, and the values of this polynomial and domain variable t are bounded in [−1, 1]. The Chebyshev polynomial of an arbitrary degree is obtained from the recurrence relation, as follows: From the properties of these orthogonal polynomials, the objective function f (t) of the feature values can be approximated if the degree d is sufficiently large as follows: which is the truncated approximation for d ≤ N d , where N d is an arbitrary number of data sample points. Note that the interval for the approximated function was normalized to [−1, 1]. Upon restoration of the original physical values using the inverse transform, the extent of the interval after the inverse transform can be arbitrarily extended to the designed horizon. The Chebyshev coefficient c v in (3) can be obtained by the following Chebyshev transform: where The transform in (4) is denoted for the set of coefficients: where x 1:N d denotes N d data samples, and N f denotes the number of features in x. Namely, the objective function for each feature is transformed, and the results are aggregated into one, as in (6). In this study, the degree of approximation is set to d = 4, which is known as the cubic Chebyshev transform. As the transformed coefficients in (6) form the feature data in the proposed framework, the degree of approximation determines the number of features; thus, the degree of approximation is determined by coordinating the model complexity and approximation quality. For the cubic Chebyshev transform, the approximated function for a single feature is represented as follows: where g is the approximated function of f .

III. TRAJECTORY PREDICTION FRAMEWORK A. PROPOSED FRAMEWORK
The proposed framework is based on an LSTM encoderdecoder structure. Fig. 2 presents the architecture of the proposed trajectory prediction framework. The LSTM encoder-decoder, also known as seq2seq, is popular in the field of time-series prediction because it supports a flexible model structure with an arbitrary number of sequences on the side of both the input and output. The conventional prediction model based on physical values is represented as: where x 1:z u and x 1:l o denote the input and predicted sequences, respectively, and x 1:z u is from observations. In the proposed framework, the original trajectory snippet for input sequences x t h :t o u is divided into m sub-trajectory snippets, where t h and t o denote the history horizon and current time, respectively, and the original physical data x are composed of two features consisting of the x-and y-axis coordinates as That is, x t h :t o u represents the past positions of the vehicle in the interval from t h to t o . The subtrajectory snippet for each feature in (9) is cubic-Chebyshev-transformed, and the resulting coefficients are aggregated into one from (6) as: where c represents the new feature data and is termed the Chebyshev coefficient snippet (CCS). In (10), the first four and next four elements are from the subtrajectory snippets for the x@hyphe and y-positions, respectively. It should be noted that the number of sequences was reduced to one for each subtrajectory snippet. Consequently, there exists a total of m input sequences of CCS, which are denoted as c 1:m u , and they are fed to the encoder part. In terms of the output sequences, a total of n sequences of CCS are predicted by the proposed framework and denoted as c 1:n o . Finally, the prediction model is transformed into the proposed framework as follows: where T (·) denotes the transform for each subtrajectory snippet and c 1:m u denotes the set of input CCS for equally divided time slots, and k and T denote the number of samples in the subtime span and sample time, respectively. The history horizon t h can be represented as: For the output sequences, the time span for a sequence is increased in multiples of the time span for a unit CCS to the prediction horizon t p as depicted on the right in Fig. 3, where t p can be represented as: In Fig. 1, c i o denotes the predicted output sequence of the i-th index, and the output has a total of n sequences. Note that the last output sequence c n o covers the entire time span up to the prediction horizon; thus, it becomes the final prediction result in this configuration. This means that the prediction result exists in the form of the coefficient set, which is referred to as a snapshot, rather than a series of numerous time-sequential values. This is the optimal configuration that ensures the best prediction results and was empirically determined by conducting a substantial amount of validation using various types of configurations.
The predicted trajectory can be restored from the last sequence c n o in the form of predicted positions using an  inverse transform, such as A great advantage of this restoration process is that the sample time for the restoration can be set regardless of the original sample time in the data. This feature can be useful for path planning on the side of the receiving vehicle. Moreover, because the noise is removed by filtration during the transform, higher-order physical values, such as the velocity and acceleration, can be restored without the effect of noise.

IV. EXPERIMENT A. DATASET
The proposed framework was verified using the inD [27] dataset, which is a large-scale dataset of naturalistic vehicle trajectories at urban intersections. The data were collected at 25 Hz using camera-equipped drones and a typical position error of less than 10 cm was guaranteed. The experiment was conducted with data collected in Neuköllner Strasse, Aachen, Germany. In particular, 143 vehicle trajectories for left turns (depicted in Fig. 4) were utilized in our experiment because they feature highly dynamic motions, which enabled the verification to be conducted under severe conditions. As shown in Fig. 4, the vehicle trajectories for the experiment are widely distributed, which increases the uncertainty in the prediction problems. In this configuration, a total of 23,419 trajectory snippet pairs for the input and target sequences were generated, and they were transformed into CCSs according to the discussion in Section III. The data were randomly split into learning and test sets in an 80:20 ratio.

B. EXPERIMENTAL SET-UP
The experiment was set up on the PyTorch platform using the hyperparameters specified in Table 2 in our model. NVIDIA GeForce RTX 2060, 16 GB RAM, and Intel Core i7-8750 CPU @2.21 GHz were used in the experiments. The mean squared error (MSE) was used as a loss function, and this metric evaluates the average squared Euclidean distance between the target coefficient vector as in (10) and the predicted coefficient vector as where e x ijk = c x ijk − c x ijk and e y ijk = c y ijk − c y ijk denote the prediction errors for each element in (9). N b in (16) denotes the number of data points in a batch. As denoted in (16), the loss function is customized to set the weights w i to focus on learning for the specific output sequences. This is because the final prediction results were obtained from the final output sequence. In this regard, the weight w i was assigned the value of 1 for sequences that correspond to the time horizon from 3 to 5 s, and set to 0 for short-term sequences. The Adam optimizer was used with a weight decay of 0.00001.
The objective of this study is to construct a robust trajectory prediction framework that is robust to sensor-degradation problems. One of the representative and critical types of incomplete sensor data is missing data, for which various factors, including adverse weather and hardware failures, could be responsible. Thus, incomplete sensor data were emulated by injecting the missing periodic data. The original physical input data were zero padded at the frequency of 1/2.5/5 Hz, and the number of consecutive missing data was set to 1/2/3 samples, resulting in a total of 9 patterns of missing data. For a comparative analysis between the proposed and conventional approaches and between various CCS configurations in the proposed approach, the following alternatives were examined: • Baseline: The LSTM encoder-decoder model without any feature engineering. The input and output feature data consist of time-sequential position values in a conventional approach.
• M2M1: The LSTM encoder-decoder model with multiple input and output CCS sequences, as shown in Fig. 3. The time span for input unit CCS l s in Fig. 3 was set to 0.5 s. The time span for the output sequences is increased in multiples of l s to the prediction horizon t p in Table 2, as described in the previous section.
• M2M2: Same as the M2M1 model, except that the time span for unit CCS l s was set to 1.0 s, which was longer than that of M2M1.
• M2O: The LSTM encoder-decoder model with multiple input CCS sequences and only one output CCS sequence. The l s for the input sequence is set to 0.5 s, and only the output sequence covers the entire time span, up to the prediction horizon t p in Table 2. Fig. 5(a) shows the M2O configuration.
• O2O: The LSTM encoder-decoder model with one input and output CCS sequence. For this model, the input sequence from the observations covers the entire time span down to the historical horizon t h in Table 2, and the output sequence covers the entire time span up to the prediction horizon t p in Table 2. Fig. 5(b) shows the O2O configuration.
A comparative study shows that the proposed framework with Chebyshev transform-based feature engineering renders the trajectory predictions even more robust than the conventional approach.
Moreover, a comparative study between various CCS configurations presents the most suitable configuration for the trajectory prediction problems with the proposed framework. For each configuration of the CCSs for the various models presented in Fig. 3 and Fig 5, the input CCSs sequences are fed to the proposed framework, as shown in Fig. 2, which outputs the predicted CCS sequences according to the configurations described above.

C. PREDICTION RESULTS-CHEBYSHEV COEFFICIENTS
The original trajectory snippets were transformed into CCSs, which were subsequently used to train the model. Thus, the prediction performance was first verified in the form of the Chebyshev coefficients. The proposed framework was optimally trained with losses from the learning and test data, but the prediction accuracy was calculated from a specific instance of the trajectory during the left turn. This instance was set to t l + 2 s, where t l indicates the time the vehicles crossed the imaginary start line L s . This configuration generates verification data for the most dynamic instances during the turns. Recall that the final prediction results are obtained from the last output CCS sequence. The prediction accuracy was measured as the average of the Euclidean distances between the target coefficient vector, as in (10), and the predicted coefficient vector from the last output CCS sequence, as where N v denotes the number of verification data points, and the asterisk indicates that the value is from the last output sequence. Table 3 presents the experimental results of the trajectory prediction for the M2M1 and M2M2 models in the form of Chebyshev coefficients, and the values were calculated from the metric in (17). As shown in Table 3, the lowest error was recorded for the M2M1 model with a shorter CCS time span, and it seems that the higher the number of input sequences, the more accurate the results become. However, the M2M2 model, with a longer CCS time span, produced superior prediction results when the input data were periodically missing. Moreover, the results for the M2M1 model deteriorated when data were more frequently missing, whereas the results for M2M2 were robust to the frequency of missing data. This could be attributed to the longer CCS time span because the results become insensitive to missing data owing to the uniform nature of the Chebyshev transform.

D. PREDICTION RESULTS-POSITIONS
The last output CCS sequence was inverse-transformed by (15) for all verification data and the accuracy was calculated using the mean absolute error (MAE) as: where x * ij and y * ij denote the predicted x-and y-position inverse-transformed from the last output CCS sequence, respectively; and N p denotes the number of predicted position sequences, where N p = t p /T . The error for each individually predicted point was calculated using the Euclidean distance between the two points, using (18). Table 4 presents the comparative experimental results of the trajectory prediction VOLUME 10, 2022 in the form of the inverse-transformed positions. To verify the efficacy of the proposed framework, the results of the baseline model in the conventional approach are presented in Table 4. The input data of the conventional approach features the x-and y-position, velocity, and heading, whereas the target data features only the x-and y-position. The conventional approach produced a total of 76 and 125 sequences for the time horizon in Table 2, and other hyperparameters were assigned the same values as in the proposed method. Note that the number of sequences was larger than that of the proposed framework, thus the training time was much longer in the case of the conventional approach, as will be discussed in the following paragraph.
In the case of the proposed framework, the results were generally consistent with those in Table 3. The prediction accuracy was higher in the case of the shorter CCS time span (M2M1), but the results for the longer CCS time span (M2M2) were more robust to the various patterns of missing data. Fig. 6 presents the trajectory prediction results from three samples with the M2M1 model. As shown in the figure, the prediction results became more stable as the sequence progressed. This aspect is one of the advantages of the proposed framework because the last predicted sequence in Fig. 6(c), (f), and (i) are the final prediction results for the prediction horizon. This appears to be a remedy for the chronic problem of LSTM, namely the inconsistency in the early sequences, as reported in [28].
The differences in the prediction accuracy between the models were not significant when no values were missing. However, in the case of the conventional approach, the prediction accuracy dropped dramatically even for the pattern in which missing data occurred most rarely. The experimental results in Table 4 verify that the proposed framework is significantly robust to incomplete sensor data.
Moreover, the number of variables in the final prediction results for the proposed framework was only 2d = 8, whereas that for the conventional approach was 2 × 125 = 250. As mentioned previously, this aspect is likely to be beneficial for V2X-based applications in cooperative driving automation.
Furthermore, Table 5 presents the trajectory prediction results for other CS configurations. In the table, all configurations listed were robust to the various missing data patterns, but the best results were obtained with the M2M model depicted in Fig. 3. These results prove that the RNN structure is highly appropriate for the trajectory prediction problems. Although not listed in Table 5, numerous configurations were tested to determine the optimal configuration for the proposed framework.

E. TRAINING COST
Under the experimental set-up of this study, the training costs of M2M1 and M2M2 models were 1h:18m:40s and 1h:4m:6s in time and 574 and 626 epochs, respectively. These results were comparable to those of the baseline, that is, 3h:5m:10s for 705 epochs, and attributed to the significant reduction in the number of sequences owing to the proposed Chebyshev transform-based feature engineering. For example, the lengths of the sequences were merely 6 and 10 for the input and output, respectively, in the M2M1 configuration, whereas it was 76 and 125 for the baseline.

V. CONCLUSION
Cooperative driving automation for a higher level of autonomy requires situation awareness to be robust to harsh environments, including adverse weather, because roadside sensors are exposed to the external environment. This study proposed a robust trajectory prediction framework based on a recurrent neural network. The robustness of the framework was proved by analyzing experimental data from a public vehicle trajectory dataset. Moreover, the proposed framework establishes the possibility of efficient communication between the connected vehicles and roadside units. This efficiency was attributed to the fact that the proposed framework requires only a few coefficients, rather than numerous timesequential physical values, to be transmitted. This aspect is expected to play an important role in the realization of cooperative driving automation.
The present study has a limitation in that the feature data only include the trajectories of the ego vehicle for simplicity. Moreover, the implementation of fault injection was limited to the representative form of the missing data. Because the future trajectory can also be affected by the behavior of surrounding vehicles, in the future, we will include a discussion on the effect of various forms of incompleteness in the information about ego and surrounding vehicles in trajectory prediction, as well as on the relevant countermeasures. Notably, the sophisticated feature engineering proposed in this study can be integrated with other models.