A Hierarchical Forecasting Model of Pedestrian Crossing Behavior for Autonomous Vehicle

Simulation of pedestrians in shared spaces poses a significant challenge in autonomous driving virtual testing. The simulation pedestrian model can respond to autonomous vehicle behaviour changes. We present HFPM: a Hierarchical Forecasting Pedestrian Model to imitate pedestrian behaviour. The model has three layers: the dynamics model layer, the path planning layer, and the decision layer. In the dynamics model layer, an improved force model with the heading direction of the pedestrian is developed based on the Social Force Model, which can model pedestrian-pedestrian interaction. In the path planning layer, an Artificial Potential Field model is modified to plan a feasible path to the individual goals. The planning layer has a prediction module to predict the trajectory of vehicles on the road in order to choose the best route with no collision. The decision layer is a finite state machine with five states: the pedestrian can approach, walk, wait, run and reach the goal. The resulting HFPM model can produce more accurate simulation results than previously developed policy-based models, as demonstrated through qualitative and quantitative comparisons with a baseline pedestrian model obtained from the CITR data set.


I. INTRODUCTION
Pedestrians are the most vulnerable road users compared to cars and motorcycles, and many traffic accidents can happen when pedestrians interact with the autonomous driving vehicle.A method to improve the safety of autonomous driving is using simulation, especially when some scenarios are too dangerous to test in the real world [1].The pedestrian agents in the simulation need to act as they do in the real world.
There are many existing methods for pedestrian/vehicle behaviour simulation approaches.Traditional model-based forecasting methods [2], [3], [4] produce motion prediction by physical pedestrian models and handcrafted rules, which cannot accurately model multi-agent interaction.Free-model approaches, with deep neural networks [5], [6], [7], can predict the future possible trajectory and improve the The associate editor coordinating the review of this manuscript and approving it for publication was Mouquan Shen .prediction accuracy.However, most existing approaches model and forecast the pedestrians' movement without imposing a detailed environment.Many factors can affect the trajectory of pedestrians, including vehicles, other pedestrians, static obstacles, traffic lights, and traffic signs.Pedestrians' movement modelling is significantly complex if interaction includes Pedestrian-Vehicle Interaction (PVI), Pedestrian-Pedestrian Interaction (PPI) as well as Pedestrian-Obstacle Interaction (POI) at the same time.
This paper proposes a novel hierarchical forecasting model called HFPM, which is a closed-loop simulation for the traffic environment in order to forecast multi-class interaction and uncertainty-aware decisions for pedestrian agents, including PVI and PPI.The HFPM combines the pedestrian-headed physical model, a risk possibility map, and a data-driven SVM decision-making model together.Finally, the theoretical hypothesis is grounded in real-world data.Our simulation environment is focused on its application and verification on shared spaces [8], [9] using the open FIGURE 1.The layout of CITR data set experiment [10] with multiple pedestrians in a shared space.
data-set of [10], which is an interaction-focused dataset used by [11], [12], [13], and [14].Thus, the model easily extends to unprotected crossroads, and low-speed limited streets, commonly found in urban areas with traffic-calming measures [15].Shared spaces are common in the UK, France, Italy, India and many other places around the world [8].Many shared spaces worldwide with a high density of pedestrians are even more difficult to simulate [16], [17].Even though the HFPM model is developed for such low speed setting, its structure is in principle agnostic to the level of vehicle speed.Even though it employs machine learning for parameter tuning (decision model and vehicular trajectory prediction), the structure of the model is informed by an understanding of PVI and PPI principles.Hence, this model can be generally used by an autonomous driving system and a virtual model in the loop (MIL) simulation.By comparing our HFPM with existing work, the simulation results indicate that there is a significant improvement in simulation accuracy in comparison with some of the existing work.This work provides the following contributions: • The pedestrian model includes scenarios of acceleration, deceleration, running to cross, slowing down or stopping to let the car pass first.The pedestrian model can also permit the pedestrian to step back or stay in a group.
• The HFPM model can simulate the pedestrian's movement very close to real pedestrian behaviour because of the low-level headed dynamics model and risk possibility planner.An LSTM predictor helps pedestrians avoid vehicles in advance, as expected from real pedestrians.
• The data-driven SVM decision model can choose the desired velocity of a pedestrian, making a pedestrian wait if it is too dangerous to cross the road.
• The HFPM model has been validated on open-source real-world data, and was compared with a benchmark model [10]. 1he rest of the paper is structured as follows.Section II introduces related work in pedestrian behaviour forecasting.Section III explains our HFPM model, and Section IV reports the experiment information.Section V shows the performance and simulation results of our HFPM model compared to the baseline model.Section VI explains the conclusion and shows future work.

II. BACKGROUND AND RELATED WORK
Detecting, understanding, and forecasting the pedestrian is key to ensuring that Autonomous Vehicles (AVs) can interact safely with pedestrians in shared spaces.Therefore, there is significant work on pedestrian movement forecasting.A survey paper by Camara et al. [18] has summarized the research on interacting and forecasting models.Fanta et al. divided the existing research into four categories: (i) Physical Models, (ii) Cellular-based Models, (iii) Dynamic Graphical Models, and (iv) Deep Learning Methods.
Physical Models [7], [19], [20], [21], [22] predict pedestrian movement by modelling the environment and physical equations.The most well-known one is the Social Force Model (SFM) [23], [24] in which pedestrians act in a force field like charged particles in an electric field.The authors of [23] and [24] applied the social force model as a method to simulate a group of people escaping from a room in an emergency.Since SFM can represent people's movements roughly, some researchers have begun to modify this model.Larter and Scott [25] presented a modified Social Force Model-based pedestrian behaviour model considering the intention, which can move towards the sub-goal.In [24], the social forces are classified into two parts: individual and group forces, which can effectively simulate multi-group pedestrians.Helbing and Molnár [2] introduced a model combining the Social Force Model with decision models for conflict resolution.They observed that most pedestrians ignored the state of the signal before their crossing decisionmaking process and instead kept a gap-acceptance approach.Some researchers focused on the optimal parameters of the Social Force Model via a maximum log-likelihood estimation [26].The authors also consider how factors, such as traffic lights, can have an effect on pedestrians.
There are also hierarchical approaches.In [16], a hybrid model combines the social force model with a rule-based decision model.The pedestrian model can decide if crossing the road is suitable or waiting until a vehicle passes.However, the decision model is a hand-crafted model, and all the results rely on manually tuned parameters.In order to avoid those shortcomings, we decided to replace that decision model with a purely data-driven model.In [17], Anvari et al. modified the Social Force Model with the constraints of a physical model.The model included a path planning layer to find an optimal path with the shortest distance and a rule-based layer that could define the constraints of the pedestrian model.Tian et al. [27] created a gap acceptance model based on visual looming cues and the binary choice logit method to predict pedestrian crossing decisions.
Dynamic Graphical Models (DGMs) [21], [28], [29] use Markov Decision Process Models (MDPMs) that account for decision probability distribution into the trajectory's prediction.Broz et al. [30] used Partially Observable Markov Decision Processes (POMDPs) to model a time-dependent Human-Robot Interaction, and they used a driving task dataset as an example to verify their POMDP model.The method in [4] used Markov Decision Processes (MDP) for the long-term pedestrian trajectory prediction.In [31], Vasquez proposed a cost-to-go MDP planner to represent the uncertainty of pedestrian movement instead of the conventional MDP value function.In [32], an extended Kalman filter was used to forecast the pedestrian trajectory and estimate another key value, the time-to-collision range (TTCR) of crossing pedestrians.However, most of these works cannot predict specific velocity and position values since they require too many simplifications of environmental factors.
Cellular-based Models rely on the environment being divided into a square grid.A cell has two states: free and occupied.Each pedestrian will occupy one cell at a onetime step.Once the problem is formulated, researchers have considered various methods with Cellular-based Models.The first is the Artificial Potential Field (APF) [33].The APF method is capable of planning a path for a pedestrian in an unknown scenario.There are two parts of the potential: attractive potential generated by the destination and repulsive potential generated by obstacles.In [34], Liu et al.
proposed a time-consumption potential field model, which can enhance potential field models where a pedestrian is waiting too long (e.g.being trapped into a local minimum), forcing the pedestrian to keep walking.The improved pedestrian model is closer to real pedestrian movement characteristics.
Another aspect of people crossing behaviour is spending the least energy and time.Trajectory optimization has been popular: Game theory is widely used in the field of emergency evacuation [35], [36], [37].Camara et al. [38] used Game Theory to model a pedestrian's possible movements resulting from changes in the AVs movement.The interactions between an autonomous vehicle and pedestrians are popular to be solved using game theoretic methods.
Machine Learning and Deep Learning Methods.To predict complex crossing behaviour, more and more researchers are now trying to use Machine Learning or Deep Learning as state-of-the-art tools for forecasting and decision-making.Deep Learning Methods are widely used to solve the prediction problem of pedestrians.The approaches of [39], [40], and [41] predict the future trajectory according to the existing data of pedestrians.Cheng et al. [42] proposed a CNN-based lane-change decision-making method via the dynamic motion image representation.One popular method is Recurrent neural network (RNN) models, with the advantage of capturing long-time series for moving positions [43].However, a critical disadvantage of RNNs is that when the sequences get longer and longer, the issue of vanishing or exploding gradients might happen.Therefore, the Long Short-term Memory (LSTM) model is widely used to prevent drawbacks of RNN.Alahi [39] introduced a special Social-LSTM prediction model adding a Social Pooling, which can model the pedestrian-pedestrian interaction.Due to the multi-modal possible routes, the idea of the High Definition Map could be used to improve the prediction accuracy [44].Ivanovic et al. [41] introduced a novel prediction model which can identify road users according to their moving speed and velocity.And then using their own network to predict different types of road users.In terms of the design choices for our model, machine learning can be used to improve the system performance, especially in reshaping the potential field.Vallon et al. [45] tried to combine a data-driven approach with Model Predictive Control (MPC) to create decision logic for the lane change algorithm.Jayaraman et al. [46] used a multimodal pedestrian decision model to predict pedestrian crossing behaviours.
The pedestrian is an intelligent agent in the simulation.A hierarchical forecasting model is feasible to achieve better prediction and simulation results.Dividing system requirements into different layers to achieve better overall performance is possible.For example, the Social Force Model could be used in reproducing pedestrian trajectory because the state of the Social Force Model can be continuous.Some fully autonomous driving systems use Cellular-based Models, such as Artificial Potential Fields, to plan a feasible path.While the pedestrian model using Artificial Potential Field has no dynamics, its position and speed are not continuous.Dynamic Graphical Models focus on the strategies for crossing the road.However, the environmental complexity has to be limited.Therefore, combining all these methods, with each method only focused on a key issue, helps achieve better overall performance.The system can be divided into three layers: decision-making, path planning, and action execution.
Our work proposes a hierarchical pedestrian model, including a low-level physical model.The low-level physical model implements a pedestrian heading mechanism that can replicate behaviours of turning around.The higher decisionmaking level enables switching the behaviour mode, such as being more aggressive or conservative.In Figure 2, the structure of the HFPM model is shown.There are three main layers, as shown in Figure 2. The following section will explain each layer and how they work together.

III. HIERARCHICAL FORECASTING MODEL A. PROBLEM ASSUMPTION
We aim to predict the movement of pedestrians with PVI and PPI, which will include vehicles and pedestrians.There are a number of assumptions, we considered to help solve the problem.
First, it is assumed that pedestrians only need to interact with the closest vehicle instead of all the vehicles.The pedestrians need to cross a one-way road without overtaking vehicles, i.e. assuming the closest vehicle will not change.
Second, it is assumed that all pedestrians can cross the road and reach the goal, with two choices overall: waiting until the closest vehicle passes or crossing the road directly.Therefore, a decision-making model is necessary.
Third, there is no traffic light or traffic sign that can influence the pedestrian's decision.
To simplify the model and the subsequent simulation, the environment is assumed to have a globally observed Vehicle-to-Everything (V2X) system, which means that the positions and velocities of pedestrians and vehicles can be observed in real-time.So, the crucial work lies in forecasting and prediction.All the observations and perceptions get a bird's-eye view from the point of view of pedestrians themselves without bias.Although the ego-vehicle can be a fully autonomous vehicle or a vehicle driven by a human with ADAS, the ego-vehicle in the simulation model is regarded as a fully autonomous vehicle.

B. DATA SET UNDERLYING THE MODELLING AND TESTING
The model, specifically the data-driven decision model, uses the CITR data set [10].As shown in Figure 1, the layout of the CITR experiment is in a car park with a vehicle and pedestrians.They used a DJI Phantom 3 SE Drone with a down-facing camera on a gimbal system to record data during the experiments, with postprocessing to minimise the perception error.The vehicle's speed is under 20 km/h in most cases.Pedestrians in the experiment need to interact with this low-speed vehicle.The reason we chose the CITR data set is that this data set focuses on fundamental vehiclepedestrian and pedestrian-pedestrian interaction in shared spaces.The focus was on information on pedestrians and vehicles extracted from a video.The CITR data set has extracted and filtered vehicle and pedestrian position and speed information.The sampling frequency of the data given is 1  T s = 30 Hz.
The resulting model from the CITR data set is a hybrid model, informed by a data-driven component and an inertial physics-informed element.For instance, using the above information, a binary SVM model can classify and predict the pedestrian state using all the logged information.To calibrate the data set, clear definitions of pedestrian states should be determined, which will be defined in the next section and are informed by the different scenarios of the data set.There are four scenarios in the data set, with different walking directions of pedestrians, including (a) bidirectional crossing (e.g. Figure 1): Pedestrians stand on both sides of the road, trying to cross the road as a vehicle approaches.They have to interact with the vehicle as well as other pedestrians.(b) uni-directional crossing.(c) pedestrian and vehicle in the opposite direction: Vehicle and pedestrians move in the road on the same line but in opposite directions.(d) the vehicle and pedestrians move in the road in the same line and direction.

C. DECISION-MAKING LAYER
In this section, we will implement the decision-making layer, which is tested with an appropriate input to execute, given = 0 m/s.The start point is when the pedestrian begins to slow down, and the deceleration lasts for 1 s at least.The endpoint lies where the pedestrian begins to speed up, and the acceleration lasts for more than 1 s at least.

c: WALKING STATE
This state stands for the pedestrian to decide to cross the road at a predictable and stable pace, Once the distance between the pedestrian and the destination is less than 0.5 m, the pedestrian will switch to the reached state, v ref i = 0 m/s.The speed will be zero, and the pedestrian will wait until the simulation ends.
Because of unclear precondition of transitions, some state transitions are calculated using an SVM classifier, such as the transitions between waiting, walking, and running states.Meanwhile, some transitions are defined by rules, such as entering the reached state, with equations that can define such transitions.The different types of transition are shown in Figure 3 with different arrows.

2) STATE TRANSITION BASED ON SVM CLASSIFIER
The transition between the waiting, walking, and running states is based on a data-driven SVM classifier trained with the relative position and velocity between pedestrians and the vehicle.Other transitions are all rule-based.
The first input is the relative position between the vehicle and the pedestrian, in the coordinate of which the origin is the pedestrian's position.Using the relative position means the decision model is universal.The relative position is where p r (t) is the relative position in time t, the p v (t) is the position of vehicle in time t and p p (t) is the position of pedestrian in time t.
The relative velocity between the vehicle and the pedestrian is another important factor in evaluating the collision risk.If the relative velocity vector starts from the vehicle position and points close to the origin where the pedestrian stays, the risk of collision is higher.The relative velocity is where ⃗ v r (t) is the relative velocity, the ⃗ v v (t) is the velocity of vehicle and ⃗ v p (t) is the velocity of pedestrian.
The set of training samples SV includes the history information of the relative position and velocity for t = i×T s , T s = 33 ms, i = 0, 1, 2, . ..: Here, n decision stands for the time length of discrete log data of pedestrians that the decision model needs.The sampling frequency of the data set is 1 T s = 30 Hz (see Section III-B).If n decision is very large, that means the model needs more frames of log data to make a decision.The pedestrian model cannot make a decision at the beginning of the simulation based on SVM.It will make a new decision until it has enough log data.

D. PATH PLANNING LAYER
The path planning layer is based on the personal preferences of pedestrians.We aim to develop a Cellular-based Model that can describe the collision risk and generate a possible path for the pedestrian model.In 1986 [47], the potential field was introduced by Khatib, inspired by the concept of an electric field.In traditional methods, the repulsive potential field, such as the road boundaries, is set by calculating the distance from the static obstacles to the agent.For a specific pedestrian, i, U ped n i is the potential of other pedestrians n, and the U veh i is the potential of a vehicle and its feasible future path potential.The U path i is the potential of the predicted path generated by the LSTM predictor.The total proposed potential U i of pedestrian i can be derived from some potential component combination, including the repulsive potential of static and moving obstacles.However, predicting various kinds of obstacles is an extremely difficult task.We aim to predict if a car will be moving.Other obstacles, such as motorcycles, bicycles, and pets, are not included in our work, and we will consider doing that in future work.

1) POTENTIAL FIELD OF LSTM PREDICTOR
A predictor is used to reshape the potential field to improve pedestrian-vehicle interaction when the pedestrian and vehicle are going in the same or opposite direction.In this way, the pedestrian can interact with the approaching vehicle, especially those with the same path.If the vehicle is an autonomous driving vehicle, the system can use the path planning information of the vehicle to produce this certain potential field.Because the vehicle in the CITR data set is a golf car driven by humans, there is no path planning information.Therefore, finding a possible solution to forecast the vehicle trajectory will be beneficial, so a trajectory predictor is used.This is also more realistic, as a pedestrian would equally carry out such a prediction.The LSTM has been proven successful at solving the prediction problem in time series [48].The entire network includes four layers: sequence input layer, LSTM layer, fully connected layer, and regression layer.

a: THE TRAINING PROCESS
The planning layer employs the historical position data of the vehicle {p v (t−n×T s ), p v (t−(n−1)×T s ), . . ., p v (t)} to predict the future trajectory of the vehicle in a short time window.The position information at time step t includes the position on the x-axis and y-axis p v (t) = (x v (t), y v (t)).The output of the predictor is (x v (t), ŷv (t)).The prediction errors J x v and J y v can be calculated as The Root Mean Square Error (RMSE) is calculated after finishing the training process and is used to evaluate the performance of the prediction.The equation to calculate the RMSE will be The loss function considers the RMSE error in both the x and y directions, combining the different prediction errors along the axis Because the position data can have a wide range of values, the position data will be normalized and standardized before the training process.By calculating the mean µ and the variance of the position data δ, all the position data of the vehicle can be normalized to have new position data with a mean of 0 and a standard deviation of 1 The length of input data is n = 20, which means that the predictor can begin to work when the time step surpasses 20.The network will predict the next 1-time step, and then update the network.This process will be executed 50 times recurrently, predicting the trajectory with 50 position points, which is about 1.65 s.The prediction time is decided by the crossing time.This is because a pedestrian can cross the road in less than 1.6 s if the pedestrian chooses a regular walking pace.In addition, the pedestrian can go past the cart inside the prediction time, as the average speed of pedestrians is 1.24 m/s and the width of the EZ-GO Golf Cart is 1.2 m [10].The predicted trajectory will reshape the potential field in front of the vehicle.With the high gradient sensitivity of the potential field, the pedestrian model can be affected by approaching vehicles at a longer distance.The training and validation losses are below 10 −4 .

2) THE OVERALL POTENTIAL FIELD
The resultant potential field of a specific pedestrian, i, is the superposition of all the other fields.The overall equation is where P is the number of pedestrians within a given area.The U cross i is the crosswalk boundary field of pedestrian i.After extracting all the information about pedestrians, vehicles, and obstacles, the overall potential field of the environment can be drawn.The output of the path planning layer is the negative gradient of the field function U i .The formula is The vector ⃗ u direction i represents the direction of the fastest decline in the repulsion potential field function, which is a unit vector.The output of the planning layer can be a heading angle θ ref i , which is converted from the unit vector ⃗ u direction i .An output vector can meet the system requirements because the path planning layer is executed every 33 ms.The unit vector will be the input of the headed pedestrian model.
Figure 4 shows the repulsive artificial potential field of the scenario (Yellow represents the high potential at a certain point, while dark blue stands for low potential.).From Figure 4, the vehicle is on the left of the space while pedestrians are on the right.The yellow straight line in front of the vehicle is the predicted path generated by the LSTM 9030 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.predictor, which can make a difference when the car and pedestrians move in the opposite direction.

E. HEADED PEDESTRIAN DYNAMICS MODEL LAYER
The Headed Pedestrian Dynamics Model is inspired by the differential robot, which has only two wheels on each side.Previous models, such as the social force model, do not consider the pedestrian's heading direction.However, studies indicate that people do have a preferred moving direction.Most pedestrians tend to walk ahead in the street, and their movement can be described by a non-holonomic model [49].If they want to return while walking in the street, they tend to turn around first instead of walking backwards.In this subsection, a Headed Pedestrian Dynamics Model is introduced to improve the realism of the pedestrian movement simulated by the Social Force Model.The key improvement of this subsection is the inclusion of pedestrian heading into the dynamics model.
Pedestrians do move laterally, but the displacement of lateral walking is limited, and it rarely happens.Considering a pedestrian moving in a 2D environment, the velocity of the pedestrian can be decoupled into longitudinal linear velocity, lateral linear velocity, and yaw rate.The lateral linear velocity is constrained within a small value to simplify the pedestrian model.Therefore, the velocity of a pedestrian only includes lateral, longitudinal linear velocity and yaw rate.
Figure 5 is the structure of the headed pedestrian dynamics model layer.The dynamics model has two reference inputs, F is the force in the direction of the heading, and τ is the input torque about the vertical axis that controls the yaw rate of the pedestrian.The pedestrian has 6 states: 2D position with the heading [x i , y i , θ i ], the yaw rate ω i and the linear velocity ⃗ v i .
Figure 6 shows the headed dynamics model.The model pedestrian can move longitudinally as well as turn around.According to Figure 6, the pedestrian moving model can be written as where and I i is the moment of inertia of the walker i.The parameter m i denotes the mass of the pedestrian.
Because the pedestrian model is dynamic, it is necessary to implement a PI controller for pedestrian velocity and a PD controller for the heading angle to govern the pedestrian's velocity by a desired demand value.The controller can control the headed dynamics model through output feedback.The outputs of the controller, which is also the input of the pedestrian model, include two factors, the force along the pedestrian's heading ⃗ F and the torque about the vertical axis that controls the yaw rate τ .The controller of the headed pedestrian model has two reference inputs.The first is a direction vector ⃗ u direction i from the planning layer.The second input is the reference speed v ref i .The outputs of a Social Force Model and the control mentioned in this Subsection will calculate the force F and torque τ according to the reference input.
The Social Force Model is used here for two reasons.The first reason is that the system needs a physical constraint to avoid collisions between two or more pedestrians.The second reason is that some people tend to change their walking direction immediately and accelerate when others walk too close to them by coincidence.The social force in this system works only when people are getting too close (≤ 0.5 m) by passing through a dead-zone function b is a parameter constant.The vectors ⃗ r i and ⃗ r j means the actual position of pedestrian i and j, ⃗ r ij = ⃗ r i − ⃗ r j .The value of F s max is the upper limit of social force, which is decided by  the maximum acceleration of a pedestrian The force ⃗ F s limited can be decomposed into F s x and F s y using the following equations The social force begins to play a part when the pedestrian enters one of those collision risk margins.Note that the social force (when non-zero) has a relatively larger influence compared with the force generated by the potential field within the working area.
The reference of the controller includes desired speed and heading angle r . The input force is calculated according to the velocity error between the reference and actual value: where for parameter K p and T i .The longitudinal velocity and lateral velocity are heavily dependent on the force If the K p is larger, the velocity overshoot could be larger.Integrating the velocity error aims to eliminate steady-state error.As ⃗ The components F c x and F c y are two of the outputs of the pedestrian model controller.The F c x and F c y components have upper limits in order to avoid acceleration beyond human limits.F s x and F s y are social forces other pedestrians and vehicles generate.The input τ represents the torque along the vertical axis, which can change the yaw rate of the pedestrian.This input τ is computed as The parameters K θ and K ω can be changed for a stable pedestrian dynamics model heading angle.The variable ω i is the angular velocity of the pedestrian.

IV. EXPERIMENTATION A. BASELINE MODEL
Rule-based Decision Methods are widely used in Autonomous Driving Systems.The model used in our simulation is an improved version inspired by Manon [16].
There are three important concepts as the input of the decision model: the time-to-conflict (TTC), the approaching angle of the pedestrian, and the predicted arrival order at the meeting point.The TTC is the estimated time of collision for a certain vehicle and pedestrian with the speed unchanged.
In order to verify TTC, two layers of risk circle area for pedestrian-vehicle interaction are defined around the vehicle, as shown in Figure 7.If the pedestrian is walking towards the vehicle, he/she will enter the risk circle area first, where the probability of collision is relatively high, and the pedestrian should pay more attention to the vehicle.If the pedestrian keeps going, he/she will enter the collision area, where the collision probability is extremely high.The radius of the collision area is equal to the length of the vehicle.The radius of the risk area is an adjustable parameter.More information can be found in [16].

B. TUNING AND CALIBRATION 1) THE DATA-DRIVEN MODEL
The data-driven decision is based on the data of the traffic information from the CITR data set.The information includes not only the current position and speed but also the time 9032 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 7.
The conflict area in the decision model: collision area (red) and risk area (yellow).At the initial time, the pedestrian is in the lower-right corner.After t seconds, the pedestrian enters the risk area.series of several seconds of vehicle and pedestrian history information.
To calibrate the decision layer, labelling the training data set is necessary.Every time-step needs a pedestrian state to indicate the pedestrians are waiting, walking, or running.A standard is set to analyse when the pedestrian begins to walk or run.The pedestrian's transient states depend on the steady-state speed.
The output path of the planning layer should be close to the data set.It is necessary to tune the parameter of the potential field to let the pedestrians keep a suitable distance instead of being farther and farther away from each other.Limiting the acting area of other pedestrians' potential fields is the solution.If the distance between two pedestrians is larger than 1.2 meters, a pedestrian's potential will be zero, so they will not interact with each other.
The dynamics model layer needs the data of a real pedestrian, such as the weight, inertia, and maximum acceleration, shown in Table 1.The parameters can be set differently according to the age, sex, and personality of the pedestrian.

A. CROSSING DECISION 1) QUALITATIVE EVALUATION
Both pedestrian decision models (data and rule-based models) can make decisions according to the traffic flow.However, our question lies in the performance in the simulation environment.Bidirectional scenarios are considered [50].The proposed scenario reproduces the situation of pedestrians walking to cross in front of the vehicle.Pedestrians can choose to stop until the vehicle passes, or run to the goal before the vehicle arrives.Moreover, the simulation includes the whole hierarchical model in order to get accurate speed data.However, the decision model still makes different decisions from the data set.A common case is that the decision model forces the pedestrian to run after the waiting state ends.However, people tend to choose to walk in the same situations during the real-world experiment.One possible reason is that some pedestrians run when the vehicle is very close during real-world experiments because the vehicle did not stop.The decision model learned from this situation, and when the vehicle is close, the decision model is more likely to select the running state.

2) QUANTITATIVE EVALUATION
The accuracy of the decision-making should be evaluated.The environment information is extracted from the CITR data set.The model's decision will not affect pedestrian behaviour, which means the pedestrian trajectory is experimental data from the CITR data set.In this way, we can objectively evaluate the decision model.The data-driven model decision accuracy (83.98%) is slightly higher than the rule-based model (79.91%).Note that in the case of these models, the decision model can easily affect pedestrian behaviour.The decision accuracy will be lowered because the decision difference will be larger once the position of the pedestrian is different.For example, if the pedestrian walks faster in the simulation compared with the experiment, he/she might arrive at the goal directly without any waiting, which might cause the simulation result to be very different from the data set result for a specific case.The derivation of the data-driven model versus the rule-based model is clearly more efficient, as the user tuning of the rule-based model is based on the specific case, while the data-driven model follows a databased optimization approach.

B. TRAJECTORY SIMULATION RESULT
In this section, we focus on the overall system performance.To evaluate the performance of the proposed model, the CITR data set is used, and the simulation error for different time horizons is also considered.The HPFM is used to simulate the first 6 s of the pedestrians.Figure 8, 9, 10, and 11 show simulation results of pedestrians with different starting positions and the prediction trajectories are divided into three sub-plots, showing a time horizon of 2 s for each plot.Our key performance indicator is the position and velocity errors between the simulation result and the data set experiment.The accuracy of the LSTM predictor is shown in Table 2.
Figure 12 compares the hierarchical forecasting models with different decision layers.Figure 12 shows the final median displacement position error between two different models.Our model produced a smaller position error,   particularly for the simulation of the mid-period (about 4 s).The reason is that our model has a path planning layer in order to approach the goal.The state machine based on SVM is also introduced to improve the decision-making process.The resulting displacement errors are smaller than the benchmark model based on an all-rule-based model.There is a limitation of the prediction error in which the possible trajectories for a pedestrian are not included.Two pedestrians crossing the same road might make a different decision, for example, running towards different directions, which might cause large 9034 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.simulation position errors if their choices differ from the model choices.It is a challenge to calibrate each pedestrian before the test.From Figure 8, 9, 10, and 11, the position error will be significantly increased because the model and real pedestrian cannot make the decision at the same time, even though they can make the same decision.

VI. CONCLUSION
We proposed a hierarchical forecasting pedestrian model to predict pedestrians' behaviour.We identified some pedestrians' behaviours that the previous baseline fully rule-based decision model did not reproduce accurately.
The proposed model can reproduce various behaviours well, such as stopping, running, and walking.Qualitative and quantitative evaluation of the model has been carried out in comparison with different methods and the ground truth, i.e. the model has been validated through the CITR data set.The model can simulate multi-pedestrian scenarios at the same time.In our test, eight pedestrians are simulated crossing a road.Each pedestrian can make a decision close to the ground truth data set.The proposed model requires relatively low computational resources, enabling independent forecasting of the behaviour of each pedestrian in the scene at any given time.However, improving the data-driven decision model through a larger data set is possible.The data-driven decision model can learn more decision-making scenarios with more sample data.The finite states and constant desired speed at each state limit the simulation results.The desired speed can only be close to the statistical average walking or running speed, instead of fitting every pedestrian.The headed dynamics model can consider stochastic factors to improve the robustness.
For future work, we will try to apply the results to other areas involving human traffic.For example, service robots in warehouses, other indoor scenarios but also surveillance are potential fields to which our results can be applied.In addition, we will try to use real-world HD maps and urban scenario datasets in the simulation to create more realistic scenarios.

FIGURE 3 .
FIGURE 3. The finite states of the data-driven decision model.The orange line means the transition is rule-based, while the blue line means the transition is data-driven.

FIGURE 4 .
FIGURE 4. The simulated potential field with multiple pedestrians.

FIGURE 5 .
FIGURE 5.The headed pedestrian dynamics model structure.

FIGURE 6 .
FIGURE 6. Headed pedestrian dynamics model.The model has 3 degrees of freedom: translation along the x-axis and y-axis, and rotation along the z-axis.The movement can be transformed into the global coordinate system.

FIGURE 8 .FIGURE 9 .
FIGURE 8. Bidirectional crossing scenario simulation (Scenario (a)).Multiple pedestrians cross the road from two sides of the road.Blue circles stand for the simulation result of the pedestrians' position.Red circles stand for the experimental pedestrians' position.The blue/grey block stands for the outline of the vehicle.The shading lines are the historical trajectory for 2 s. (see footnote 1 on page 9026 for video links.)

FIGURE 10 .
FIGURE 10.Pedestrian and vehicle in opposite direction simulation (Scenario (c)).All the pedestrians walk from the left side to the right side.The vehicle moves in the opposite direction.This scenario aims to simulate streets with very dense crowds.

FIGURE 11 .
FIGURE 11.Vehicle and pedestrian in the same direction simulation (Scenario (d)).All the pedestrians move from the left side to the right side.The vehicle moves in the same direction as the pedestrian at a higher speed.

FIGURE 12 .
FIGURE 12. Median final displacement error and inter range with different decision model.
The start point is after the waiting state ends.And once the state switches to the waiting or reached state, the walking state ends.
e: REACHED STATE

TABLE 1 .
Parameters of the HMFP.

TABLE 2 .
Result of LSTM predictor in planning layer.