Motion Planning for Dynamic Scenario Vehicles in Autonomous-Driving Simulations

Scenario vehicles are an important part of the dynamic environment utilized in autonomous-driving simulations. They are required to meet the demands of traffic-scenario diversity and form a larger coverage scale in the road network. However, the current motion planning of scenario vehicles either adheres to the classical microscopic traffic-flow model or follows a predefined path; thus, interacting with the vehicle under test in a dynamic bidirectional fashion is difficult. This study researches a motion-planning method for a broader category of unmanned vehicles and proposes a motion-planning method for scenario vehicles based on Pontryagin’s minimal principle, used in optimal control theory and the closed-form solution of the minimum snap method. The study reclassifies actions received from the behavior layer according to the boundary conditions and final times and derives an analytical solution for each of them. The analytical solution is then experimentally verified. The proposed method not only accomplishes efficient motion planning but also exhibits variant driving styles, which provides a practical solution for the motion planning of scenario vehicles in simulations.


I. INTRODUCTION
Scenario vehicles are an important part of the dynamic environment in autonomous-driving simulations. Scenario vehicles need to perform dynamic interaction tasks with the vehicle under test with high confidence in a bidirectional manner and have to exhibit the differences between different individuals; these are a recently emerging demand and are becoming increasingly important [1], [2]. Making the behavior of scenario vehicles more realistic and similar to that of a real human driver is an open challenge for traffic simulations.
Previous research has proposed a scenario vehicle decision model for the virtual simulation of autonomous driving, which simulates the mental demand process of a real driver [3]. However, the behavior and actions decided by the model were only slightly improved by an algorithm used for autonomous driving. The motion planning used for the scenario vehicle is still immature: it is either missing and The associate editor coordinating the review of this manuscript and approving it for publication was Giambattista Gruosso . still adheres to the classical microscopic traffic-flow model that passively sets the current motion state in real time, or it follows a few limited paths that are predefined before the simulation and cannot form an effective interaction.
However, the biggest difference between the motion planning of an autonomous-driving vehicle and that of scenario vehicles is that the latter is more time-critical. The motion planning of an autonomous-driving vehicle only needs to determine an optimal trajectory within 100 ms. However, applying the same motion-planning method to hundreds or thousands of scenario vehicles in the test environment causes the computation workload to increase exponentially. With current hardware conditions, which are barely sufficient for self-driving motion planning, such a wide range of computations is almost impossible to achieve.
To the best of the author's knowledge, this paper is the first to identify the abovementioned challenges and propose corresponding solutions. It reclassifies the actions determined by the behavioral layer into four different solutions according to different computational requirements and proposes VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a motion-planning algorithm for dynamic scenario vehicles based on Pontryagin's minimum principle and the minimum snap closed-form solution method. The algorithm improves planning efficiency by calculating the analytical solution of the motion curve, which directly avoids the need for numerical iterative solving or taking turns to judge many sampling points. Additionally, it allows the setting of multiple preferences and forming a variant driving style to meet the needs of diverse traffic scenarios. It is worth noting that the motion planning proposed in this study is not only applicable to the previously proposed decision model [3] but can also be used in other motion-planning methods for autonomous driving, where behavioral planning serves as a basic framework level.
The code of the model was written entirely in C# to be able to associate with the decision model, and its performance was tested and verified using Unity 2020. 3.
The remainder of this article is organized as follows. In Section II, after finding that no suitable solution is currently available, research of recent years on motion planning from the perspective of unmanned vehicles is summarized. Section III presents the relevant theories required for this study, which are Pontryagin's minimum principle and minimum snap closed-form solution. In Section IV, the motion-planning method is presented in detail. Section V presents the outcomes of several actions conducted based on the proposed method and the diversity and performance of the proposed algorithm. Finally, the last section summarizes the full text and presents legacy issues.

II. RELATED WORK
Classical microscopic traffic-flow models are typically used in virtual testing to deliver a dynamic and interactive environment to an autonomous vehicle under test [3]. However, Ni showed that traffic-flow simulation is limited in four aspects: consistency, flexibility, forward-looking capability, and multidimensional scalability [4]. Thus, deploying a microscopic traffic simulation directly in an autonomous-driving test may lead to biased evaluation results [1]. Some studies have attempted to improve the microscopic traffic-flow model, such as that by Sharath and Velaga, which developed a two-dimensional motion model using the intelligent driver model (IDM) [5]; a study by Mullakkal-Babu et al. described the lateral motion and interaction between vehicles by introducing a bicycle-dynamics model [6]. However, these changes are still centered on the IDM car-following model, which is not fundamentally different from the traditional microscopic traffic-flow approach. For the motion planning of moving objects, there are many relatively mature approaches in research related to unmanned vehicles. Although there are some similarities between scenario vehicles and autonomous vehicles, the scope of this study is not limited to the subcategory of autonomous vehicles but is focused on the larger category of unmanned vehicles. Some early studies [7] utilized optimization methods; however, it was not until Mellinger and Kumar used the minimum snap method to generate the trajectory of a quadrotor [8] that the core of unmanned-vehicle motion planning gradually changed from a path-planning problem that utilizes a graph search to an optimization-based trajectory-planning problem [9]. As in the studies by Chen et al. [10] and Lim et al. [11], trajectory planning is generally modeled as a quadratic planning problem; that is, the objective function is (convex) quadratic, and the constraint function is affine. Gao and Shen turned this into a quadratically constrained quadratic program because they restricted the positions in the trajectory to a spherically safe region [12]. Qian et al. [13] argued that for logical constraints, such as traffic rules and obstacles, classical methods are not applicable, and they proposed the use of a mixed-integer quadratic planning method. In other studies [10], [14], [15], [16], the entire problem had to be built as a nonlinear optimization problem, owing to the introduction of complex dynamical models and environmental constraints.
Regardless of the type of optimization problem, the first step in motion planning is to describe the surrounding environment. The main approaches that are used to model the environment are driving/flying corridors, artificial potential fields, and S-T diagrams. A study by the Technical University of Munich used driving corridors to derive a suitable set of constraint equations for collision avoidance. When combined with existing continuous optimization-based motionplanning methods, trajectories can be efficiently planned in arbitrary traffic situations [17]. They also combined driving corridors with a sampling-based motion planner that uses reachability analysis to determine collision-free drivable areas for the subject vehicle. It can continuously adjust its sampling interval according to the rapidly changing environment in the traffic scene, thereby significantly reducing the number of samples and computation time required for planning [18]. A study by the Hong Kong University of Science and Technology proposed a method for the online trajectory generation of quadrotor aircraft in unknown environments. They directly used point-cloud maps to identify collisionfree flight corridors, which ensured the smoothness and safety of the generated trajectories [12]. The study also utilized an efficient octree data structure to generate online flight corridors that covered each other in free space, which bound the trajectories completely for safety and satisfied higher-order dynamics constraints [10].
Khatib proposed the artificial potential-field method in 1986 [19]. This method formed a virtual artificial potential field by combining the repulsive field of an obstacle and attractive field of the target location, and it then determined a collision-free optimal path by searching for the descending direction of the potential function. Zheng et al. introduced potential-field functions to evaluate the collision risk of candidate paths. They applied a new method based on the Frenet coordinate system that overcame the limitation of being restricted to static and straight roads and included more complex driving scenarios, such as curved roads [14]. Heidari and Saska proposed a method for unmanned aerial vehicle (UAV) obstacle avoidance based on an improved artificial potential field. This method overcame the local-minima problem and determined practical trajectories for robot path planning. The experimental results clearly demonstrated the effectiveness of the proposed method for multirotor systems [20].
The S-T diagram method was an important approach to motion planning, which was proposed by Kant and Zucker. They decomposed the trajectory-planning problem into the route-and velocity-planning problems. The latter was projected in a route-time space, where time was explicitly used as the dimension of variation. Thus, the changed velocityplanning problem was reduced to a graph-search problem in this space [21]. This approach was also used in the Apollo Project. The path-velocity approach may not be optimal owing to the emergence of dynamic obstacles. However, because the path and velocity were decoupled, this approach achieved greater flexibility in both path and velocity optimizations [22]. They further optimized this method in 2019 by generating a closed-form driving guide line via a differentiable curve, which was used as an input to output dynamically feasible jerk-and time-optimal trajectories. The improved method is particularly useful for curved roads that require frequent acceleration and deceleration to accommodate the centripetal acceleration limit [23]. Li et al. generated candidate longitudinal trajectories in the S-T space using an improved hybrid A-star algorithm in the operation sequence. They then used a search method to generate lateral trajectories simultaneously in the safety corridor. The method was validated in two cases: static and dynamic avoidance of objects in one direction and lane borrowing considering two-way conditions [24].
Another challenge in motion planning is to find a solution. Although numerical computation is a common solution method in trajectory generation [12], [13], [14], [16], [17], it can directly incorporate the objective cost function and constraints into the trajectory generation. However, trajectory generation for configuration spaces with high degrees of freedom is inefficient and easily falls into local minima; therefore, this method is generally suitable for low-dimensional configuration spaces. Discretization methods [18], [25], [26], [27], [28], [29], such as state lattices, typically cannot meet the demand when driving situations suddenly become more hazardous. In addition, considering the execution efficiency of the computer, the granularity of discretization cannot be too high; otherwise, it may not be possible to discover narrow passages quickly enough to generate feasible motions in a critical situation. For particle-swarm optimization [30], [31], [32], it is unlikely that a random search algorithm that may take tens of seconds can achieve fast real-time trajectory solutions.
This article also adopts the optimal control method. However, to quickly find the solution, the constraints imposed by the dynamics and the environment are simplified as much as possible.

III. BACKGROUND A. OPTIMAL CONTROL
Optimal control is an important branch of dynamic systemoptimization theory. It focuses on choosing a permissible control according to the established mathematical model of the controlled object in the time or frequency domain so that the controlled object operates according to predetermined requirements, and the given performance measure reaches its optimal value [33], [34].
Several varieties of optimal control problems exist based on the performance measure, time domain (continuous or discrete), existence of various types of constraints, and variables that can be freely chosen [35]. The optimal control problem first requires the determination of the state equation of the system under control: where x is an n × 1 state vector, u is an m × 1 control vector, and a (·) is an n×1 dimensional continuous vector function that is continuously differentiable for x (t) and t: J is the performance measure; h (·) and g (·) are continuously differentiable scalar functions; h x t f , t f is the terminal cost function, and The Hamiltonian function is constructed as follows: where the Lagrangian multiplier λ is the covariance vector. The necessary conditions for obtaining the optimal solution are as follows: Different boundary conditions will simplify (6) accordingly.

B. PONTRYAGIN'S MINIMUM PRINCIPLE
If is the set of all feasible control domains, Pontryagin's minimal principle indicates that the optimal control u * must satisfy It provides necessary but insufficient conditions for the opti- and , t is positive definite, then it is sufficient to guarantee that u * (t) causes H to become a local minimum. If the Hamiltonian can be expressed in the form where c is an m × 1 array that does not contain any terms containing u (t); satisfying (8) and ∂ 2 H ∂u 2 > 0 are necessary and sufficient for u * (t) to cause H to be the global minimum.

C. MINIMUM JERK CLOSED-FORM SOLUTION
For each segmented trajectory i, if its performance measure can be expressed as the squared integral of polynomial P i , it can also be expressed as follows: where T i denotes the time of the segment i, p i denotes the vector consisting of the coefficients to be determined for polynomial P i , and matrix Q (T i ) is a positive definite symmetric matrix. From the boundary conditions, where d i is a vector comprising the derivative values of the i th segment's start d 0 and final d T points. A mapping matrix between a polynomial's coefficients and endpoint derivatives is used to set the constraints on the i th segment of a trajectory. By combining the performance measures and constraints of each segment, the following is obtained: By substituting the constraints into the original cost function, the following is obtained: These variables are rearranged so that the free/unspecified derivatives (d P ) are grouped with the fixed/specified derivatives (d F ). This reordering is performed using permutation matrix C, which comprises ones and zeros. Now, where Differentiating J and equating it to zero yields the vector of optimal values for the free derivatives in terms of the fixed derivatives and cost matrix: Individual evaluations of the appropriate constraint equations projecting derivatives back into the space of coefficients can now be utilized to recover the polynomials [36].

IV. MOTION PLANNING
The study assumed that the behavior layer of the scenario vehicle determines the three main types of behaviors B: changing speed B changeSpeed , changing lane B changeLane , and keeping idle (B idle ). Each behavior contains several actions, similar to that in [3]. Among these, the B changeSpeed set includes five actions: maintaining safe headway (b cs headway ), emergent deceleration (b cs emergent ), slowing down and stopping (b cs stop ), complying with the speed limit (b cs limit ), and giving way to others (b cs morality ). The B changeLane set includes four actions, that is, changing lanes to the target lane (b cl destination ), changing lanes to the faster lane (b cl excitement ), changing lanes to a random adjacent lane (b cl random ), and aborting changing lanes (b cl abort ). The B idle set includes two actions: maintaining the current state (b cn normal ) and alerting the front vehicle and maintaining b cn alert .
The motion of the scenario vehicle was simplified to a mass-point model, and the longitudinal and lateral motions were computed independently. However, this does not mean that the non-holonomic constraints on the vehicle are completely ignored. First, the trajectory of the main motion is planned, and then, restrictions are imposed on the concomitant motion in the other direction, ensuring the feasibility of the motion. For two types of behaviors, B changeSpeed and (B idle ), the main motion is longitudinal motion, which is along the reference line, and the concomitant motion is lateral motion, which is perpendicular to the reference line. For the B changeLane behavior, the main motion is perpendicular to the reference line, and the accompanying motion is along the reference line.

A. MAIN MOTION
The final boundary states and times of the main motions of the 11 actions are listed in Table 1. The trajectory planning for end-time free actions is solved using Pontryagin's minimal principle, and the trajectory planning for time-fixed actions is calculated using closed-form solution methods. In addition, the trajectory planning for actions with constrained final states can be transformed into an end-fixed trajectory-planning problem, which is explained in more detail in the corresponding action classification.
where x, v, and a denote displacement, velocity, and acceleration, respectively. Input u is the third-order differentiation of the displacement, that is, jerk. The initial state is given in (19), and x 0 , v 0 , and a 0 are the position, velocity, and acceleration at time t = 0, respectively: The final state is given by (20), where x f , v f , anda f are the position, velocity, and acceleration at time t = t f , respectively.
The performance measure is as follows: where K denotes the preference for different actions, which is described in Section IV.A.5. The Hamiltonian function is constructed as follows: According to (5), where c 1 , c 2 , and c 3 are the constants to be determined. According to Pontryagin's minimal principle, Equation (24) clearly satisfies (9), and ∂ 2 H ∂u 2 = 1 > 0; therefore, u * is the global minimum. VOLUME 11, 2023 where c 4 , c 5 , and c 6 are the constants to be determined. According to boundary conditions (19) and (20), The final position is fixed; hence, δx f = 0. Moreover, the final time is free; thus, δt f = 0. Therefore, according to (6), the following is obtained: In the Frenet coordinate system, there should be no motion in the lateral direction before and after the lane change; therefore, the velocity and acceleration should be zero. Let the lateral displacement at the initial moment be zero.
Substituting (25) , (26), and (29) into (28), the following is obtained: Three valid digits are retained, and time t f is solved as follows: t f = ±3.49 Time can only be a positive real number; therefore, By substituting t f and the assumptions in (29) back into (26), the constants to be determined and target trajectory can be obtained.
b: b cs stop , b cl abort b cs stop is used when the scenario vehicle approaches an unsignalized intersection and needs to stop at the stop line. The action b cl abort is used to abandon the current lane-change behavior and return the vehicle to the original driving lane when a hazard is encountered. These two actions are similar to the previous case; however, the initial speed is nonzero.
Then, (34) is solved for time t f : From the six complex solutions of (35), the smallest real number greater than zero is selected as the solution to the problem, and t f and the assumptions in (32) are substituted back into (26) to obtain the constants to be determined. The target trajectory is then obtained.

2) FINAL STATE CONSTRAINED AND FINAL TIME FREE
The main motions of both actions in this category are along the direction of the road reference line. In a critical situation, action b cs emergent allows the scenario vehicle to stop as soon as possible. When the scenario vehicle needs to accelerate or decelerate, action b cs limit can increase or decrease its speed. It can be observed that the speed and acceleration of the final state are fixed, whereas the displacement is unrestricted and can be treated as a final state-constrained, time-free optimal control problem. However, if the unconstrained displacement is ignored, this problem becomes a final state-fixed, time-free optimal control problem again. The expression for displacement with respect to time can be obtained by integrating over the velocity trajectory. Unlike the previous two categories, the equation of state for the main motion is The boundary conditions are The performance measure is the same as that of (21). The Hamiltonian function is constructed as follows: According to (5), the covariance equation is where c 1 and c 2 are the constants to be determined. According to Pontryagin's minimal principle, Equation (40) clearly satisfies the form of (9) and ∂ 2 H ∂u 2 = 1 > 0; therefore, u * is the global minimum. Substituting (39) into (40) yields the following: where c 3 and c 4 are the constants to be determined. Substituting (42) into (37) yields the following: According to (27), Equations (42) and (43) are substituted into (44), and let a f = 0; this gives Then, time t f is solved: where By substituting t f and the assumption that a f = 0 back into (43), the constants c 1 , c 2 , c 3 , and c 4 and the change in velocity can be obtained. The trajectory of the main motion can be obtained by integrating the velocity polynomial.

3) FINAL STATE FIXED AND FINAL TIME FIXED
Both b cs morality and b cs headway are used to maintain the scenario vehicle at a particular distance from the vehicle in front of it. Assume that the displacement of the main motion satisfies a fifth-order polynomial function: Because the final state and time are both determined, the known conditions in (48) can easily be substituted into (47) for the solution: VOLUME 11, 2023 The polynomial coefficients p can be obtained by solving the following matrix: where

4) FINAL STATE CONSTRAINED AND FINAL TIME FIXED
The tasks of both b cn maitain and b cn alert are to maintain the speed constant for a particular period of time. Setting the final time to a fixed value takes into account the execution mechanism of the upper layer to prevent subsequent determinations from being performed. Similar to the previous scenario, the velocities and accelerations of the initial and final states are known. The change in velocity with respect to time is assumed to satisfy a cubic polynomial relationship as follows: The boundary conditions are known: The polynomial coefficients p can be obtained by solving the following matrix: where

5) ACTION PREFERENCE K
When the main motion is to change lanes, that is, B changelane , where T CL denotes the duration of lane change, which can be considered as T CL ∼ N 3.6, 1.8 2 [37].
When the main motion is to change the vehicle speed, B changeSpeed , K indicates the preference for longitudinal acceleration or deceleration: where α denotes the acceleration or braking performance of the vehicle, and β indicates the driver's preference for comfort. β ∈ [0, 1]; therefore, the closer the value is to 0, the more uncomfortable the motion, but the faster the action is executed. The closer the value is to 1, the more comfortable the driver prefers to drive, and the action is executed for a relatively long time.
t ap denotes the time required for a 0−100 km/h acceleration, and d bp denotes the braking distance from 50 to 0 km/h.

B. CONCOMITANT MOTION
The concomitant motions of the B changeSpeed behavior are solved in the same manner as the morality and keeping-distance actions in the main motion by directly substituting the initial and final states into (51), which is not discussed further here. The trajectory planning of the concomitant motion with a lane change as the main motion is described below. Let the maximum velocity generated by the main motion be v main_max , and the velocity of the concomitant motion at this time be v sub , which corresponds to time t main , t main ∈ t 0 , t f . Assuming that the maximum lateral acceleration should not exceed a lat_max , it follows that where V denotes the combined longitudinal and lateral velocities, and R denotes the turning radius at this point. Let v sub_limit be the maximum value that v sub can obtain. Substituting (57) into (56) gives and where v sub (t) denotes the velocity of the concomitant motion at time t. Because t main is definitely between t 0 and t f , the concomitant motion is divided into two parts. Using the initial condition t 0 and [x 0 , v 0 , a 0 ], final condition t f and [x f , v f , a f ], and velocity v sub (t main ) at t main , the position x sub (t main ) and acceleration a sub (t main ) at t main can be calculated using the closed-form solution method, and the trajectory of the concomitant motion can then be obtained. Assume that both concomitant motions satisfy the following fourth-degree polynomial relationship: Then, the corresponding polynomials of velocity, acceleration, and jerk are as follows: where j i denotes the jerk of the i th segment of the concomitant motion. For easier notation, let Using (12) , where Using (13) , where For (15) , the fixed and free derivatives are as follows: and the reordering matrix C T is: Equation (16) was applied to determine a m . By substituting the equation into (13), the polynomial coefficients of the accompanying motion can be derived.

V. SIMULATION
The computer used for the simulation was a MacBook Pro (16-inch, 2019) with an 8-core Intel i9 CPU and 32 GB 2667 MHz DDR4 memory. The simulation was run on Unity (2020.3.23f1, Unity Technologies, USA). The motionplanning program was written entirely in C#, and the matrix operations used Math.NET, which is an open-source scientific computing library. All seven types of motion were verified on an OpenDrive-formatted map at Site − 1 and Site − 2, as shown in Fig. 1. Site−1 is a multilane road segment, which is used to test actions in which the primary motion is lateral.
Site − 2 is a more curved road segment, which is used to test actions in which the primary motion is longitudinal.

3) STOP
The vehicle gradually decelerates from = 10 m/s to stop 30 m ahead. The braking performance α = 62.0 and comfort preference β = 1. As shown in Fig. 4 a and b, the speed gradually decreases from 10 m/s to zero after 4.65 s. Fig. 4 c illustrates the position and acceleration changes of the vehicle in the Frenet space.

4) EMERGENT
The vehicle decelerates to zero immediately from = 10 m/s with a braking performance α = 62.0 and comfort preference β = 0. Compared with b cs stop , this maneuver reduces the speed to zero in only 2.32 s, and the braking distance is significantly shorter.

5) LIMIT
The vehicle accelerates from zero to 10 m/s with an acceleration performance α = 1.39 and comfort preference β = 0. As shown in Fig. 6 a and b, the entire motion time lasts for 6.00 s. As shown in Fig. 6 c, the acceleration gradually decreases during this process.

6) HEADWAY
Suppose the vehicle reaches 22 m ahead after 2 s and must increase its speed to 10 m/s. As shown in Fig. 7 a and b, the

7) MAINTAIN
The speed of the vehicle is maintained at a constant value of 10 m/s. As shown in Fig. 8 a, position increases linearly. Accordingly, the speed remains constant, as shown in Fig. 8 b.

B. DIVERSITY
Assuming that the car accelerates from zero to 10 m/s, the action b cs limit can exhibit a wide range of variations through changes in acceleration performance and comfort, as shown in Fig. 9.

C. EFFICIENCY
Finally, the execution efficiencies of the seven actions were tested. Each action was executed 10 000 times per round for a total of five rounds, and the average execution time of each action is shown in the last column of Table 2.

D. ANALYSIS
It can be seen first from the separate tests in Section V.A that the motion-planning method proposed in this paper can accomplish the trajectory planning of each action quite well. Although only one action diversity is demonstrated in Section V.B, all other actions can also show different behavioral preferences. Finally, a test of execution efficiency was performed in Section V.C. The planning time of all the actions is in the level of hundreds or even tens of microseconds. The longest planning time of action b cl abort was used to conduct the parallel test, and at most, 75 objects were guaranteed to run at 24 frames/s in real time under the current test conditions. Compared with the motion planning of autonomous driving, in particular, the execution efficiency is improved approximately a thousand times, although both of them employ quadratic planning.

VI. CONCLUSION
Scenario vehicles in autonomous-driving virtual tests lack dynamic and effective interactions with the tested target. Therefore, this study proposes a motion-planning method for dynamic scenario vehicles based on optimal control. Compared with the scenario vehicles generated by the traditional microscopic traffic flow-based method, this method produces more dynamic motion flexibility. Compared with the current motion planning for autonomous driving, this method offers faster computational efficiency. However, the proposed method may still be improved, particularly the behavior B changeLane , for which the main motion is lateral motion. Maintaining the speed of the combined motions at a constant value is difficult because the motions both along and perpendicular to the road reference line are decoupled and planned separately. Moreover, the final state of the lateral direction is determined based on the lane information of the current position, whereas the lane width may change during the movement. In addition, the number of individuals that can be simulated simultaneously is still limited, and further improvements are required. Future research may require modifications to the longitudinal-lateral decoupling approach to better match the realistic operating habits of drivers. In addition, the efficiency improvement may require a more advanced framework structure in code writing.