A Two-Block RNN-based Trajectory Prediction from Incomplete Trajectory

Trajectory prediction has gained great attention and significant progress has been made in recent years. However, most works rely on a key assumption that each video is successfully preprocessed by detection and tracking algorithms and the complete observed trajectory is always available. However, in complex real-world environments, we often encounter miss-detection of target agents (e.g., pedestrian, vehicles) caused by the bad image conditions, such as the occlusion by other agents. In this paper, we address the problem of trajectory prediction from incomplete observed trajectory due to miss-detection, where the observed trajectory includes several missing data points. We introduce a two-block RNN model that approximates the inference steps of the Bayesian filtering framework and seeks the optimal estimation of the hidden state when miss-detection occurs. The model uses two RNNs depending on the detection result. One RNN approximates the inference step of the Bayesian filter with the new measurement when the detection succeeds, while the other does the approximation when the detection fails. Our experiments show that the proposed model improves the prediction accuracy compared to the three baseline imputation methods on publicly available datasets: ETH and UCY ($9\%$ and $7\%$ improvement on the ADE and FDE metrics). We also show that our proposed method can achieve better prediction compared to the baselines when there is no miss-detection.


I. INTRODUCTION
Predicting future trajectory from video data is an indispensable technology for developing navigation systems that can be used in several scenarios, such as self-driving vehicles, social robots, and navigation systems for blind people. Highquality predictions guide the user to the appropriate path and avoid dangerous situations (e.g., collision). The most common setting of trajectory prediction is the surveillance setting from a fixed camera, where the position of the agent (e.g., pedestrian, vehicles) is often treated as a single point [1], [2].
Many approaches to trajectory prediction forecast the future state of the target agent conditioned on the history of past states [7]- [10]. The state of the agent is often represented as its position. To obtain the history of a position of the agents, we need to detect where the agents are in the current and past time steps (e.g., object detection) and to establish object correspondences between time steps (e.g., object tracking). Therefore, the trajectory prediction task is a downstream  Examples of miss-detection annotated by Faster R-CNN [3] pretrained with MS-COCO dataset [4] from ETH [5] and UCY [6] datasets. The red points indicate the ground-truth positions attached to each dataset and green boxes indicate the predictions. Existing works on trajectory prediction often ignore miss-detected agents.
trajectory prediction works rely on a key assumption that each sequence is successfully preprocessed by object detection and tracking algorithms,and that they can always access the complete trajectory for trajectory prediction. In other words, the detection algorithm has to successfully detect all of the agents in the scene, and the tracking algorithm has to successfully track all of the detected agents. In details, most trajectory prediction research from bird's-eye view images assumes that all positions of all pedestrians can be obtained during 6.4 or 8 seconds: observing the trajectory for 3.2 seconds and predicting the future trajectory for 3.2 or 4.8 seconds [1], [2]. However, this is not feasible for real-world applications. There often emerge cases wherein an object is miss-detected due to bad image condition such as motion blurs, illumination changes, occlusion by other agents, and cluttered backgrounds [11], as shown in Fig. 2. When the target agent is not detected in several frames during the observation time, the future trajectory cannot be predicted against incomplete observed trajectory (Fig. 1).
The common solution for dealing with incomplete observed trajectory is to ignore these cases from the dataset as outliers. The agent who disappears even in one frame in a sequence is excluded from the dataset to handle incomplete data [1], [2], [12]. However, in real-world situations, an intelligent system must be able to continuously predict the future trajectory of all agents. This exclusion could cause the system to ignore the interaction between the excluded target agent and other agents. Furthermore, the system cannot consider the possibility of collision with the excluded agents and it could cause dangerous accidents. To avoid the exclusion of an undetected agent, we could attempt to impute the missing state of the agent. One possible solution is to impute a previously observed state when the miss-detection occurs [13], [14]. This can prevent the data exclusion of the agent whose state is not available. However, this can cause the model to interpret the data as the person keeping the previous state. Thus, compensating the states with an assumed value can lead to high error. In this paper, we investigate the problem of trajectory prediction from incomplete observations particularly due to miss-detection.
Bayesian filter-based methods [15] (particularly, Kalman filter [16]) are among the traditional approaches for trajectory prediction [17] and they are often used as baseline methods for comparison [1]. Bayesian filters recursively update the posterior distribution of predictions with the arrival of new data. The filtering process can be described as a cycle of two steps, the prediction and the update step. Due to simple structure, they do not often perform well on longterm prediction [18]. With the ability to learn and produce long sequences, Recurrent Neural Networks (RNNs), and in particular the Long Short-Term Memory (LSTM) networks [19], have recently become a widely popular modeling approach for predicting human motion [1], [20]- [28]. Recently, the connection between Bayesian filters and RNNs has been studied, and the observation that Bayesian filters are a special type of RNNs has been proposed [29]. Inspired by the connection, we hypothesized that using the assumed state (e.g., the last observed state) for the missing time step causes the RNNs to fail to update their hidden state.
Using this intuition, to avoid wrong updates of the hidden state, we propose a simple two-block modification in which we add a new RNN block for the missing time step. We evaluate our two-block mechanism on an existing trajectory prediction model from a bird's-eye view against three baseline methods and show noticeable improvement to trajectory prediction from incomplete trajectories. Our proposed method does not affect trajectory prediction results from complete trajectories, even if the model is trained with incomplete data. Our contributions are as follows: • We propose a two-block RNN that learns the inference step of Bayesian filters for trajectory prediction from incomplete observed trajectory due to miss-detection. • We show that our model can outperform three baselines and that it performs better compared to the best baseline by 12% (ADE) and 4% (FDE) on ETH [5] and UCY [6]. Our model trained by incomplete trajectory does not affect the trajectory prediction result from the complete data.

II. RELATED WORK
The problem of trajectory prediction has received significant attention in recent years across various applications, Trajectory prediction Gupta et al. [2] Trajectory prediction Styles et al. [12] Trajectory prediction Yao et al. [13] Traffic accident detection Malla et al. [14] Trajectory prediction Lipton et al. [30] Medical data analysis Kim et al. [31] Medical data analysis Che et al. [32] Medical data analysis Cao et al. [33] General Tian et al. [34] Traffic flow prediction Kim et al. [35] Medical data analysis Luo et al. [36] General Luo et al. [37] General Ours Trajectory prediction such as self-driving vehicles, service robots, and advanced surveillance systems. A large body of research has addressed this problem. Many approaches define an explicit dynamical model based on Newton's laws of motion and use them as building blocks of a Bayesian filter (particularly, Kalman filter) [8], [18], [38], [39]. Many other works have investigated how to incorporate the concept of a rational agent when modeling human motions. Recently, a number of works explore approaches to approximate motion dynamics from training data using deep neural networks [1], [2], [20]- [28], [40]- [44]. In particular, Recurrent Neural Networks (RNNs) have recently become widely popular for modeling the dynamics approach [1], [20]- [28]. These sequential data-driven approaches assume N th -order Markov models in which a limited state (e.g., position, velocity) history of N time steps is a sufficient representation of the entire state history. In this section, we only review RNN-based trajectory prediction literature. For a more extensive review, we refer the reader to the article in [45].

A. RNNS FOR TRAJECTORY PREDICTION
Trajectory prediction has been studied extensively in surveillance settings from bird's-eye views. Many studies have developed models to account for agent social interactions and social convention, in addition to scene semantics that may affect the trajectory. Social-LSTM [1] is a pioneer work to model pedestrian trajectories as well as their interactions in continuous space. It introduces the social pooling layer, which allows the LSTMs to share the hidden states of the agents that are nearby. Many methods [2], [23], [42], [46] follow the problem formulation used in [1], including the assumption that each scene is first processed to obtain the spatial coordinates of all people at different time instants.
Recently, to achieve multi-modality in the prediction output, many works combine RNN and Generative Adversarial Networks (GANs) [2], [23], [43]. Gupta et al. [2] proposed Social GAN with a new pooling mechanism that does take into account social interactions between all people in the scene and a variety loss that encourages the network to produce diverse predictions. Sophie [23] proposed an LSTMbased GAN with two attention mechanisms. Kosaraju et al. [43] proposed Social-BiGAT, which utilizes GAN with graph attention networks (GAT) that captures the social interaction.

B. HANDLING MISSING DATA FOR RNN
In order to use RNNs for time-series prediction tasks, the observed data must be encoded by the RNNs before the prediction can be made. However, missing entries in the observed data are unsuitable for encoding. Many approaches have been developed to address this issue (see Table 1). If the task involves multiple time series (e.g., for trajectory prediction), a simple approach is to drop the missing data and perform the prediction using only the observed data [1], [2], [12]. However, this strategy leads to loss of information and may not work if the missing rate is high or when there are individual time series to process (e.g., in medical applications). Another strategy is to impute the missing data with some default values and to perform the prediction over the imputed data. For example, the last observed values [13], [14], [30], [32] or zero [30] can be used to fill in the values of missing entries. Still, the filled values may cause the system to learn from them as if they are observed values, which could lead to lower performance. More recent works propose to use learning-based techniques to perform imputation, e.g., using RNNs [31], [33]- [35] or GANs [36], [37] to estimate the missing entries, and they generally also propose to modify the internal structure of the RNNs, for example, to include a decay mechanism [32], [34], [35] that puts different weights on data from different time steps. However, it may not be straightforward to apply the modification of one architecture to different architectures as this may involve different formulations or complicated implementations. Unlike previous works that impute missing entries or modify internal structure of RNNs, in this work, we first look at VOLUME 4, 2016 Relationship between Bayesian filter and RNNs. Given a sequence of measurement zt, Bayesian filter recursively estimate the optimal hidden states xt through two steps: prediction and update step. Similarly, given a sequence of input zt, RNN updates the hidden state st every time step.
the relation between Bayesian filter and RNNs, then derive an algorithm from the relation when there is a missing observation. This results in a simple-to-implement method that does not modify any internal structure of RNNs and also does not require an explicit form of imputation. Instead, our approach modifies RNNs externally by using two RNNs instead of one. This external modification allows our approach to be used with any RNN models.

C. RNNS AND BAYESIAN FILTER
Recently, the relationship between Bayesian filter and RNNs has been discussed. Gu et al. [29] show Bayesian filter is a special type of RNN and propose an RNN-based model for a face landmark localization task in videos. Lim et al. [47] introduced Recurrent Neural Filter (RNF), which aligns network modules with the inference steps of the Bayesian filter. RNF uses separate neural network components to directly model the Bayesian filtering steps. In this paper, we align RNN-based trajectory prediction models with the Bayesian filtering steps and explore the architecture that is suitable for trajectory prediction from incomplete observed trajectory.

III. PRELIMINARIES
In this section, we briefly review the Bayesian filter [15] and its connection with RNNs. The Bayesian filter has been used in a wide range of applications, including target tracking [48], robotics [49], and economics [50]. The goal of the Bayesian filter is to find the probable posterior distribution of the hidden state x t at time instant t, given all of the measurements z 1:t , which is characterized by p(x t |z 1:t ). We use z 1:t to denote the sequence of measurements up to time instant t. Suppose there is a dynamical system represented by the following equations: where u t and v t are the system process noise and observation noise, respectively. Both are assumed to have known proba-bility distributions. The functions f and h are the state transition function and the observation function, which can be represented in a probabilistic form as p(x t |x t−1 ) and p(z t |x t ), respectively. When the state transition and the observation functions are linear and the process and measurement noises are Gaussian, the Bayesian filter becomes the Kalman filter. The Bayesian filter consists of recursive prediction and the update steps. In the prediction step, it predicts a prior distribution of the current state x t based on an old estimate using the state transition function, which is characterized by p(x t |z 1:t−1 ). In the update step, it updates prior state distribution to obtain a posterior estimate with new arrival measurements, which is characterized by p(x t |z 1:t ). Prediction Step: The prior distribution p(x t |z 1:t−1 ) of the state is obtained using the Chapman-Kolmogorov equation and the state transition function, Updating Step: When the new observation z t is obtained, the prior distribution p(x t |z 1:t−1 ) is updated according to Bayes' rule to estimate the posterior distribution p(x t |z 1:t ), where p(z t |z 1:t−1 ) can be expressed as, RNNs bear resemblance to the Bayesian filters (see Fig. 3). Given a sequence of measurement, a Bayesian filter recursively estimates the optimal hidden states through two steps: prediction and update step, and optionally produce the target output every time step. Similarly, given a sequence of input, RNN updates the hidden state via a recurrent formula and optionally produces the output every time step.
The computation of RNNs is represented by the following equations, where s t represents the hidden state at time step t, ϕ s and ϕ o are activation functions, W ss is the hidden-to-hidden transformation matrix, W zs is the input-to-hidden transformation matrix, y t is the output, W sy is the hidden-to-output transformation matrix, and b s and b o are the bias terms. Gu et al. [29] show Bayesian filters are a special type of RNNs with adaptive weights. While a Bayesian filter adapts its estimation models over time by changing weight, an RNN uses the fixed weight after training. Following this study, we assume an RNN performs the same procedure in updating the hidden state (6) as do the two steps of the Bayesian filter.
The simple solution with which to deal with incomplete measurements in the Bayesian filter is an imputation. If we can impute missing data, we can use the Bayesian filtering When miss-detection does not occur, we use gc as an encoder, and when miss-detection occurs, we use gi as an encoder.
Successfully detected Miss-detected steps in the same way. However, in the update step, the prior distribution is updated with the imputed value in (4) and this might have a bad influence on the estimation of the hidden state. Furthermore, the model in which parameters are learned with imputed assumed data will affect the update step with actual measurements. Therefore, we explore the method that does not rely on imputation.

A. PROBLEM DEFINITION
We assume that the state of an agent is represented by its position and each scene is preprocessed by detection and tracking algorithms to obtain the position of each agent at each time instance. However, due to miss-detection, sometimes the obtained observed trajectory is incomplete. This includes the cases in which an agent is continuously detected in a sequence of frames but is not detected at the frame in the middle. The position at time step t is denoted as z t , e.g., its 2D image coordinate in a bird's-eye view. When miss-detection does not occur, we can receive the complete trajectory z 1 , . . . , z T obs . On the other hand, when missdetection occurs, we can only access the incomplete observed trajectory. Our goal is to predict the probable future trajectory z T obs +1 , . . . , z T pred even from an incomplete observed trajectory. T obs and T pred denote the last observation and the last prediction time instance.

B. BAYESIAN FILTER WITH MISS-DETECTION
In the Bayesian filter, the future trajectory of the agent can be predicted with the appropriate hidden state and the observation functions. Thus, we can formulate the trajectory prediction task as the estimation of the hidden state x t , from which an output can be optionally derived, from all the measurements z 1:t−1 = [z 1 , . . . , z t−1 ], which is represented by a probabilistic form: p(x t |z 1:t−1 ). However, in the case of miss-detection, we do not have access to all of z t , t = 1, . . . , T obs . To prevent the ambiguity in the notation, let us definez 1:T obs to be the list of observed measurements until time step T obs . Note thatz 1:T obs differs from z 1:T obs since z 1:T obs assumes all z t up to time T obs to be observed, while forz 1:T obs some z t may be missing. With this notation, our goal becomes the estimation of the hidden state x t from the incomplete dataz 1:t−1 , which is represented in probabilistic form as p(x t |z 1:t−1 ), and can be used to predict the future trajectory.
In this case, one cycle of prediction and update step can be represented by the following derivation, where in (9) we split the observed z t−1 fromz 1:t−1 in (8), then we have applied the Bayes' rule to obtain (10).
Here, the prior distribution p(x t−1 |z 1:t−2 ) at time step t − 1 is updated with the new observation z t−1 to obtain the posterior distribution p(x t−1 |z 1:t−1 ) at time step t − 1. We can then estimate the prior distribution p(x t |z 1:t−1 ) at time step t from this posterior distribution p(x t−1 |z 1:t−1 ) at time step t − 1. Notice that (11) gives us a recurrent relation of computing p(x t |z 1:t−1 ) as a function of p(x t−1 |z 1:t−2 ) and z t−1 , which we will use as the foundation of our two-block model in Section IV-C.
To utilize the above mechanism when the miss-detection occurs, i.e., the new measurement z t−1 is not available, we conventionally need to synthetically generate a measurement by imputation for the update step [13], [14]. However, updating the prior distribution with a synthetically generated measurement might have a bad influence on the estimation of hidden state x t . To avoid the wrong update, we estimate the hidden state x t from the measurementsz 1:t−2 instead of compensating for the missing data with synthetically gener-VOLUME 4, 2016 ated data, p(x t |z 1:t−1 ) = p(x t |(z 1:t−2 , z t−1 not observed)), (12) = p(x t |z 1:t−2 ), = p(x t |x t−1 )p(x t−1 |z 1:t−2 )dx t−1 . (14) We can see that the prior distribution p(x t |z 1:t−1 ) at time step t can be directly estimated from the prior distribution p(x t−1 |z 1:t−2 ) at time step t − 1 without the update step. This elimination of the update step prevents the model from accumulating the error that is caused by a wrong update every time step. Notice again that (14) provides a recurrent relation to compute p(x t |z 1:t−1 ) as a function of p(x t−1 |z 1:t−2 ), but without the observation z t−1 . Using these recurrent relations, we can develop an algorithm for handling incomplete trajectories as described in the next section.

C. TWO-BLOCK MODEL FOR ENCODING MISS-DETECTION
Inspired by the connection between RNNs and the Bayesian filter [29], we apply the two above recurrent relations in the Bayesian filter with RNNs. Similar to the cases in the Bayesian filter, we use two RNNs depending on the detection result (see Fig. 4). One is used when the new measurement is available, and another is used when miss-detection occurs and the new measurement is not available for avoiding the wrong update. Fig.5 shows a diagram of our two-block RNN model. When the agent is successfully detected and a new measurement is available, we assume that one RNN works as a function that approximates a function approximating (11), which estimates the prior distribution at time step t from the prior distribution at time step t−1 and the new measurement, where g c , implemented as an RNN, is a function that approximates the update and prediction steps with a new measurement in (11). Note the similarity between (11) and (15): the former is a recurrent relation between p(x t |z 1:t−1 ), p(x t−1 |z 1:t−2 ), and z t−1 , while the latter is a recurrent relation between s t , s t−1 , and z t−1 . Thus, we can interpret (15) as representing p(x t |z 1:t−1 ) by s t and g c being a function that approximates the computation of the expression in (11). When miss-detection occurs and the new measurement is not available, we assume another RNN works as a function approximating (14), which directly estimates the prior distribution at time step t from the prior distribution at time step t − 1 without a new measurement: where g i , implemented as an RNN, is a function that approximates the prediction step without the new measurement and the update step, as in (14). Again, one could see a similar relationship between (14) and (16), which allows us to draw a similar interpretation to the case between (11) and (15).

D. PREDICTION
In the previous section, we encode the incomplete observed trajectory using the two-block RNN depending on the detection result. In order to predict the future trajectorŷ z T obs+1 :T pred using the encoded information, we use another RNN: where g p is a function that approximates a cycle of update and prediction steps, and h p is a function that approximates an observation function that predicts the measurement from the hidden state (implemented as a Multilayer Perceptron (MLP) in our experiments). The pseudocode for the algorithm is provided in Alg. if the target agent is miss-detected then 3: end if 7: end for 8: for t = T obs+1 to T pred do 9: s t = g p (s t−1 ,ẑ t−1 ); 10:ẑ t = h p (s t ); 11: end for 12: returnẑ T obs+1 :T pred

V. EXPERIMENTS
In this section, we perform experiments to evaluate our two-block RNN model against several imputation baseline approaches. All experiments are performed on a computer with an Intel Core Xeon CPU and an Nvidia Tesla K80 GPU. We use PyTorch [51] for our implementation.

A. EXPERIMENTAL SETTINGS
To evaluate our two-block RNN method, we experiment with the existing trajectory prediction model, Social GAN [2], a method to retrieve multiple possible future paths for multi-agents, on two publicly available datasets: ETH [5] and UCY [6]. We follow the same setting of Social GAN. For our appoach, we only modify the encoders of Social GAN, which use a single agent encoder to encode the observed measurements of each agent independently into our twoblock RNN.
Baselines. We compare the performance of the proposed method with three baselines. The incomplete data due to miss-detection cannot be directly inputed into Social GAN. We firstly impute incomplete data for generating the synthetic complete data and then input the data to the model. We compare against the following imputation methods: • Last filling: Impute the missing measurement with the last detected measurement. • Zero filling: Impute the missing measurement with zero. • Linear filling: Interpolate the missing measurement linearly using previous and next observed measurements. When the miss-detection happens at the end of the observation time, we extrapolate the missing measurement linearly.
Implementation details. Social GAN uses LSTMs as the RNNs in their model for encoders and sets 32 as the dimension of the hidden states for the encoder. We also use the LSTMs in our two-block RNN model. We halved the size of the hidden state of our two-block model, which consists of two RNNs for fair comparison in terms of the number of parameters. For other settings, we followed the original Social GAN.
As for the representation of measurement, Social GAN utilizes relative coordinates for translation invariance. However, when miss-detection occurs and the position at a time step is not available, we cannot compute both the previous and next relative coordinates around the time step. Therefore, we use absolute coordinates instead of relative coordinates. We normalize the pedestrian positions by subtracting the mean and dividing by the standard deviation of the training set.
Datasets. We evaluate our two-block RNN model on two datasets: ETH [5] and UCY [6], which are commonly used for the trajectory prediction task. The ETH dataset contains two scenes named ETH and HOTEL. UCY dataset includes three scenes named ZARA-01, ZARA-02, and UCY. They are collected from bird's-eye views containing thousands of real-world pedestrian trajectories, and covering numerous challenging situations. We observe trajectories for 3.2 sec-onds (8 frames) and predict for 3.2 seconds (8 frames) or 4.8 seconds (12 frames), and use a leave-one-out approach, i.e., train on four sets and test on the remaining set for evaluation.
Unfortunately, ETH and UCY do not provide a missdetection label. Therefore, we synthetically generate missdetection masks. To generate the miss-detection masks, we randomly choose a miss-detection ratio from [0.2, 0.8] for every sequence both in training and evaluation.
Evaluation metrics. We follow the prior works [1], [2] for evaluation metrics. We use the Average Displacement Error (ADE) and the Final Displacement Error (FDE) metrics. ADE is the average L2 distance between the predictions and the ground truth overall predicted time step, and the FDE is the L2 distance between the prediction and ground truth of final destination.

B. EXPERIMENTAL RESULTS
Main Results. In Table 2, we evaluate our model against all baseline models. We see that the linear imputation baseline that linearly estimates the missing state outperforms the other baselines, which do not estimate the missing state. Our twoblock RNN model outperforms baselines on almost all of the datasets on both metrics. The best result among the baselines on the ADE metric is linear imputation with an error of 1.46 (T pred = 8) and 1.95 (T pred = 12). Our model has the ADE error of 1.30 (T pred = 8) and 1.82 (T pred = 12), which is 11% and 6% less than the best baseline, respectively, and the effect becomes the most prominent on ZARA2 (33% in T pred = 8 and 25% in T pred = 12). In terms of the FDE metric, we can also get the improvement of 12% (T pred = 8) and 2% (T pred = 12), compared to the best baseline.
Changing miss-detection ratio. We compare the robustness of the two-block RNN model and baselines when the miss-detection ratio changes. We test the model, which is trained with a random miss-detection ratio chosen from [0.2, 0.8] under various fixed miss-detection ratios. In Fig.   VOLUME    6, we vary the miss-detection ratio, from 0 (complete data) to 0.7. Overall, our two-block model can get better results compared to baselines in every miss-detection ratio setting.
The performance becomes better as the miss-detection ratio becomes lower. When no miss-detection occurs, i.e., the miss-detection ratio is 0, our model achieves the best performance compared to the baseline model on both ADE and FDE metrics, in both 8 and 12 prediction time-step settings.
In the baselines, a single RNN is trained to encode both real data and imputed data. As imputed data may be confused with measurement, this could cause the model to learn the incorrect values, leading to high-error prediction even when there is no miss-detection in test time. In the 8-time-step prediction, our model outperforms the best baseline model, linear filling by 15% (ADE) and 15% (FDE). In 12-timestep prediction, our model achieves superior performance against the best baseline model, linear filling by 12% (ADE) and 8% (FDE). On the other hand, in our two-block RNN model, one RNN, namely, g c is trained to encode the real data and another RNN, namely, g i is trained to update the    Fig. 7, we report the displacement error at each prediction time step of all methods. Our model performs better every time step in both 8 and 12 prediction time step settings. This result suggests better prediction quality over all the time instants.
Changing prediction length. To show the stability in predicting longer temporal horizons, we present the ADE and FDE for the prediction of 12, 16, and 20 future time steps in Table ??. Here, we use the same models trained to predict 8 and 12 time steps as in the main results, and simply extend their prediction time steps. The performance becomes worse as the prediction length becomes longer. Still, our model has a consistent advantage at every prediction time-step setting.
Aligning the hidden state channels. To provide a fair comparison of our model, in the main results, we halve the number of channels of the hidden state of our twoblock model, which consists of two RNNs (each having 16 channels), so that the total number of channels of our twoblock model is the same as that of the baselines (32 channels). In this section, we run an additional experiment in which we align and vary the number of channels of both our model and the baseline RNNs, so that all RNNs, both of our model and the baselines, have the same number of channels, in order to confirm that the performance of our model is not caused by the difference in the number of channels. The results are shown in Table ??. We can see that our model can still outperform the baselines. Hence, we show that the performance of our model is not caused by the difference in the number of hidden state channels.
Computation time. Table 4 shows the comparison of the computation time of our method and the baseline methods. The total time is broken down into the time for filling missing values, encoding the trajectories, and making predictions. Note that the baselines require the filling time to fill the missing values, while our method does not. In terms of total time, last filling and zero filling are the fastest, followed by our method and then by linear filling. Looking at the breakdown, we can see all methods require roughly the same time for prediction, since they all use the same implementation. On the other hand, linear filling requires more time because it takes longer to compute the imputed values. Our method requires more time than the baselines do for encoding the trajectories. This is because our method requires selecting one LSTM from the two LSTM blocks to pass the data to in each time step 1 , and thus it does not receive the speed-up benefit from the optimized LSTM implementation used by the baselines. However, since the selection only requires an additional if statement, an optimized implementation of our two-block method should be able to achieve almost the same computation time as those of the baselines.
Qualitative evaluation. As a demonstration, we visualize the prediction results of our two-block RNN and those of the baseline methods in Fig. 8. Here, we draw the average predicted trajectory of 100 samples. In Fig. 8(a), we can observe that the prediction of our two-block RNN is more accurate than those of the baselines. Moreover, even when the complete trajectories are available ( Fig. 8(b)), our twoblock RNN can still predict the trajectories that are closer to the ground truth.

VI. DISCUSSION
In this section, we discuss some limitations. To begin with, we used the absolute coordinates, instead of relative coordinates. To compute the relative coordinates, we need the current position and the next position. Thus, it is not possible to compute both the previous and the next relative coordinates at one time step, when there is missing data. However, using absolute coordinates leads to a bad influence on overall performance. It is especially the case that all models perform much more poorly than those trained with relative coordinates for the task of trajectory prediction from complete data in all datasets (64% worse in T pred = 8 and 74% worse in T pred = 12 on the ADE metric). If we could use relative coordinates for trajectory prediction with incomplete data, the performance might get better. In future work, we will work on this problem.
Next, we discuss other problems of trajectory prediction in real-world environments. We focus on predicting the future trajectory from incomplete observed trajectory due to missdetection. Our model cannot handle the wrong detection, and tracking is out of focus, but if we can classify the detection or tracking result as valid or not, we can apply our model to these situations.

VII. CONCLUSION
In this paper, we address the problem of trajectory prediction from incomplete observed trajectory because of missdetection. We proposed a two-block RNN-based model, which takes advantage of the connection between the Bayesian filter and RNNs. Extensive experimental results on standard datasets, including ETH [5] and UCY [6], show that our approach outperforms baselines that are commonly used. Since we do not use imputed data for training, our model does not affect the performance of trajectory prediction from the complete observed trajectory.