Safety-Balanced Driving-Style Aware Trajectory Planning in Intersection Scenarios With Uncertain Environment

This paper proposes a two-stage trajectory planning method for self-driving vehicles (SDVs) in intersection scenarios with uncertain social circumstances while considering other traffic participants, which are human-driving vehicles (HDVs) with different driving styles. The mixture-of-experts approach is first utilized to learn from human-driving trajectory data to construct a multimodal motion planner, which uses a Transformer to model the interactions between vehicles by explicitly considering their driving styles to facilitate the integrated network to achieve scene-consistent multimodal trajectory prediction and candidate trajectory generation. Second, based on the generated trajectories for the SDV and the predicted trajectories for the other HDVs, each candidate planning trajectory is evaluated via a safety-balanced value function. After that, the trajectory with the highest value is selected for implementation. Such a method plans a safe and efficient driving trajectory in complex and uncertain scenarios. The experimental results demonstrate the efficiency and effectiveness of the designed method as well as the robustness and reasonableness of the SDVs' maneuver decisions at an intersection considering the behavioral dynamics of HDVs.

. The trajectory planning of SDVs will be influenced by different driving styles and the driving intentions of surrounding HDVs. most common but challenging tasks in daily driving [1]. Sensing and responding to time-varying social situational changes are critical for self-driving vehicles (SDVs) to make safe, effective, and efficient decisions while interacting with surrounding traffic participants. Thus, their social complexity becomes the most challenging but crucial factor.
Learning and analyzing driving styles [2], sensing and predicting new participants' intentions, and self-maneuver-decisionmaking according to situational changes are the three main functions of an SDV when planning a trajectory through an intersection considering that the other participants are human-driving vehicles (HDVs) with consistent driving styles. Fig. 1 shows a typical left-turn trajectory planning scenario with interference from oncoming traffic in an intersection. While driving, the SDV's desired turn-left trajectory planning should also account for traffic safety and avoid congestion. Therefore, if the driving style of the oncoming vehicle is radical with a solid intention to go straight, as shown in Fig. 1(a), the SDV needs to yield and slow down for collision avoidance. However, if the driving style of the oncoming vehicle is conservative with an intention to turn left, as shown in Fig. 1(d Traditional rule-based planning methods set predefined modeling rules for the above-described complex situations. They usually make reasonable planning decisions by comprehensively considering the instant social interactions among traffic participants to guarantee a specific pass rate. However, these predefined rules cannot cover all the possible situations in an intersection. The constantly-occurring rule-uncovered corner cases cause rule-based planning methods to fail all the time. In addition, rule-based SDVs cannot transfer and share common knowledge about different scenarios with one another. This lack of ability to share information combined with the aforementioned constraints results in rule-based methods being unable to utilize the benefits of Big Data effectively and limits their compatibility with popular deep learning methods.
Data-driven trajectory planning methods are quite the opposite and can complementarily use Big Data, deep learning, and reinforcement learning methods to address intersection decision-making challenges. For them, enhancing scenario generalization and modeling uncertain environment are crucial for SDVs to implement safe and rational trajectory planning for different times and scenes. However, the state-of-the-art methods often fail to achieve the above two goals simultaneously. Bansal et al. proposed ChauffeurNet, which learns trajectory planning directly from expert data [3]. Although this method is more scalable than traditional rule-based planning methods, it does not consider the generalization of scenarios and predictions regarding environmental uncertainty. Thus, it cannot guarantee the safety of autonomous driving. To address environmental uncertainty, Diehl et al. proposed the UMBRELLA model [4], which solves the problem of prediction, planning and control of autonomous driving by modeling uncertainty in a partially observable environment. However, the model does not consider the diversity of planning trajectories of autonomous vehicles and lacks the ability of relational reasoning about the environmental entities, resulting in poor scene generalization for planning. Meanwhile, the single trajectory generated by the above method has difficulty ensuring the safety of driving, and the planned route does not necessarily meet the dynamic requirements of the downstream controller. To address this issue, multimodal trajectory generation considering environmental randomness is gaining attention among researchers [5]. However, existing multimodal trajectory planning methods tend to overlook the impact of social factors on the environment, making it difficult to ensure driving safety.
Parallel driving was proposed in the early 2000s to improve the safety, smartness, sensitivity, security, sustainability and efficiency of SDVs by training in a digitalized proving group [6], [7]. Recently, with the development of data-driven machine learning methods, scholars extended the idea to virtual-real interactive learning and testing and further proposed parallel learning [8], [9], [10], [11] and parallel testing methods [12]. At the same time, MIT, Waymo and Intel also noticed the importance of pretraining in virtual space and proposed parallel autonomy [13], data closed-loop platforms [3] and lane-changing models [14], respectively, which verified the effectiveness and recognition of parallel driving.
Our work builds on the research ideas regarding parallel driving and aims to design a parallel reasoning architecture and develop new methods for safe and efficient trajectory planning. We propose a two-stage trajectory planning method for SDVs in intersection scenarios with uncertain social interactions among other traffic participants. Driving style and intention uncertainty are considered to achieve better scenario generalization performance while guaranteeing driving safety. Specifically, we learn from expert-driving trajectory data and construct a mixture-ofexperts (MoE) [5], [45] multimodal trajectory generator that is similar to the architecture described in [5]. A Transformer [15] is used to capture vehicles' interaction behaviors, and a driving-style aware module is incorporated with the network to implement scenario-consistent multimodal trajectory prediction and planning. Second, a safety verification module is designed to evaluate the candidate planning trajectory obtained during the first stage; then, the trajectory with the highest safety score is selected for the downstream controllers.
Trajectory planning at intersections on city roads where new participants continually join has been one of the most common but challenging tasks for SDVs. To this end, we focused on the social complexity of driving in intersections and proposed a safety-balanced driving-style-aware trajectory planning method. The main contributions of this work include the following: 1) A two-stage autonomous driving trajectory planning method is designed, and the method improves the safety, rationality, and robustness of self-driving strategies. 2) A candidate trajectory generation method that considers the driving styles and driving intentions of surrounding vehicles is proposed. The method improves an SDV's ability to predict the multimodal trajectory of surrounding vehicles and thus helps SDVs generate safe and interpretable trajectories in interaction scenarios. 3) A safety verification mechanism based on candidate trajectory prediction is proposed to balance the safety and efficiency of the planned trajectory in response to the surrounding uncertain vehicle behaviors. 4) The experimental results demonstrate that the designed method plans safe and efficient trajectories in intersections with uncertain environments by explicitly considering the driving style of the surrounding vehicles. The remainder of this paper is organized as follows. Section II presents the related works. Section III introduces the methodology of the proposed two-stage planning method. Section IV illustrates how the proposed method is implemented, verified and evaluated and elaborates on the results and explanations of the experiments. Section V concludes the work in this paper and presents our perspective on the development of trajectory planning in SDVs.

II. RELATED WORK
Autonomous vehicles must negotiate with other traffic participants regarding the right to use the road while driving to achieve their goals of safe and efficient driving in complex social traffic scenarios [59]. The safe path planning of SDVs in such complex situations can be formulated by predicting and analyzing the driving styles and intentions of others and making appropriate in-time maneuvering decisions. The main functions include (1) learning and predicting the driving styles and intentions of other traffic participants with mobility capabilities; (2) generating feasible multimodal paths based on road traffic conditions and the styles and intentions of potentially conflicting vehicles; (3) performing safety verification on the generated paths; and (4) selecting the safety-balanced optimal driving path.

A. Driving Style Recognition
Driving style refers to how a driver maneuvers in different driving scenarios under different external factors, such as time and weather [2]. Due to the social complexity of traffic situations and uncertainty about the drivers' states, they may exhibit different driving styles even in the same scenario. Predicting their driving style helps improve the safety and efficiency of SDV trajectory planning.
The current related algorithms are mainly classified into rule-based, model-based, and machine learning-based methods. Early rule-based approaches use a priori knowledge and manage a limited number of variables and data to classify driving styles. Model-based approaches use a set of predefined feature equations to describe driving styles by adjusting parameters to fit the driving style data [16], [17], [18]. These methods can have a high recognition ability in terms of driving styles, but it is more complicated for the model's design.
Machine learning-based methods can be specifically subdivided into three categories: (1) unsupervised training methods that statistically analyze the data with input features to achieve classification [19], [20]; (2) supervised training methods such as the K-nearest neighbors (KNN), neural networks, decision trees, random forest, and Markov models [21], [22], [23], [24], which use known sample features and corresponding styles for training; and (3) unsupervised and supervised hybrid methods that combine the advantages of both to improve the overall driving style recognition performance; one example involves combining K-means and the support vector machine (SVM) [25]. The unsupervised training method has a wide range of applications, but the output lacks interpretability; the supervised training method has fewer model parameters and is easy to understand, but the model's accuracy could be more satisfactory. The combination of unsupervised and supervised methods can make use of the advantages of both to improve the overall driving style recognition performance.

B. Driving Intention Inference
Driving intention inference refers to reasoning about the actions of surrounding vehicles over time in the upcoming time step [26]. Different vehicles in the same scenario have different driving intentions and generate different driving trajectories. Accurately inferring other traffic participants' intentions helps SDVs plan proper and safe paths.
Driving intention inference methods include probabilitybased methods and learning-based methods [27]. Probabilitybased methods model vehicle behavior through dynamics and kinetic functions, and model uncertainty through normal distributions. Learning-based methods infer driving intentions mainly through data-driven vehicle state information regarding the environment. Zyner et al. [28] used long short-term memory networks to identify vehicle intentions at intersections. Lee et al. [29] proposed using convolutional neural networks (CNNs) to predict and infer traffic participants' lane-changing intentions and improve safety and ride comfort through model predictive control. Ding et al. [30] proposed a driving intention prediction method for highway scenes that predicts vehicle intentions by considering vehicles' interaction behavior. The disadvantage of the above research is that it is difficult to use driver behavior data effectively and to fully consider the impact of driving style on driving intention. This paper develops a data-driven trajectory planning method to infer intention by considering the driver's style and predicting a reasonable driving trajectory.

C. Data-Driven Trajectory Prediction
In recent years, researchers have focused on autonomous driving trajectory prediction tasks based on deep learning models due to the increasing complexity of datasets in terms of volume and content [57]. Based on data-driven approaches, a large amount of data can be used to train the model. Most of these approaches consist of an encoder-decoder architecture. Early CNN-based [31] and recurrent neural network (RNN)-based [32] methods extract spatial features and time series information by using a bird's-eye view and historical trajectories, respectively. However, such methods encounter scene information loss and long-distance dependence problems. More importantly, the interaction information between vehicles is missed. Although graph neural networks [33] have been used in later studies to model interaction information by vectorization, these models are "black boxes" and have poor model interpretability and unimodal prediction trajectories. To solve the above problems, the generative adversarial network (GAN) is used to model the uncertainty of environmental information and generate multimodal prediction trajectories [34]. However, the GAN-based approach suffers from the problem of pattern collapse [35], which causes the training model to collapse and prevents it from converging. In addition, the Social-GAN model ignores the driving style and intention changes of surrounding vehicles, which is inconsistent with reality. To solve this problem, the leader-follower game controller model [36] uses a game-theoretic approach to model the driving style and intention classification of surrounding vehicles. The attention-based mechanism approach [37] is a popular strategy that has recently attracted the attention of researchers. This approach can quickly focus on high-value factors, especially the Transformer method [38], which has been adopted from the natural language processing field to autonomous driving. The model can generate highly accurate multimodal trajectories. In this paper, we use a data-driven planning approach and use a bird's-eye view semantic map as input, which consists of a high-definition (HD) map and agents. To account for the uncertainty of HDV driving behavior, the interactions between vehicles are modeled by a Transformer, and the driving style of the HDVs is explicitly modeled. The decoder outputs a multimodal predictive trajectory based on driving style with safety-sensitivity and interpretability.

D. Safety-Sensitive Trajectory Planning
As one of the core modules of the autonomous driving system, the trajectory planning module is used to receive environmental information input from the sensing module and output the trajectory for the downstream control module. The trajectory planning module first predicts the future trajectory using environmental and interaction information and generates a safe and interpretable planning trajectory based on the generated multimodal trajectory. Reasonable path planning is the key to achieving safe autonomous driving. This paper divides the trajectory planning task into data-driven unimodal and safety-sensitive multimodal trajectory planning.
The data-driven unimodal trajectory planning approach is mainly based on imitation learning, where driving strategies are learned from expert data. Early trajectory planning based on imitation learning used behavioral cloning (BC) [4]. However, BC relies on expert data, experiences cascading errors and suffers from low safety in unknown scenarios. Although the data aggregation method solves the problem of cascading errors, the human cost increases. In a subsequent study, the generative adversarial imitation learning (GAIL) model was proposed to solve the cascade error problem in the BC model [39]. GAIL consists of a generator and a discriminator. The generator is trained to generate expert samples, and the discriminator is trained to generate samples that are distinguishable from actual samples. These data-driven approaches focus on applying sequential decision-making techniques to simulate human drivers. This reward-based learning model avoids the drawbacks of short-sighted behavioral choices in the behavioral cloning approaches. GAIL can generate long-term trajectories that are immune to cascading errors. Kuefler et al. applied GAIL to a highway driving scenario [40]. However, the above approach does not consider driving style and driving intentions, which are the main factors that constitute uncertainty. ChauffeurNet [3] takes a bird's-eye view as input and extracts environmental information. A mixture of actual and simulated data is used to train the model. The output predicted trajectory is used as expert data to learn recovery after drifting and avoid dangerous behaviors such as collision, swaying, and lane departure by combining the behavioral changes of experts. UMBRELLA [4] jointly solves the prediction, planning and control problems of SDVs in an interpretable learning manner. This method trains the model from offline data through a model-based method while considering the partial observability of the environment and uncertainty.
The data-driven multimodal trajectory planning approaches incorporate the influence of the SDV on other agents during the planning process. A representative study first predicted the trajectories of surrounding vehicles, generated multimodal trajectory candidates, and then selected the final trajectory via rule-based metrics [5]. Although the data-driven works have high-precision inference for planning, they have not considered the influence of the road agent's driving style. Therefore, it is difficult to ensure the rationality and consistency of trajectory planning. In this paper, we design a driving-style-aware multimodal planning method and propose a safety assessment link based on safety-sensitivity as a trajectory selection method to achieve safe and efficient planning.

III. METHODOLOGY
This section first describes the trajectory planning problem under the interaction between SDVs and HDVs. Then, we propose a two-stage trajectory planning method that contains driving-style-aware candidate trajectory generation and safetyverification-based trajectory selection. The proposed method learns the driving behavior of human drivers in the interaction scenario by explicitly modeling the uncertainty of driving styles and the intentions of surrounding HDVs to generate reasonable driving trajectories. Moreover, we perform safety verification on the candidate trajectories to improve the driving feasibility of SDVs in complex interaction scenarios.

A. Problem Formulation
Guided by a predefined route, the trajectory planner accounts for the multiobjective requirements of the vehicle and the corresponding dynamics constraints and outputs the optimal executable trajectory. We formulate the driving scenario consisting of SDVs and different numbers of interactive traffic participants as a discrete-time system in continuous space [41]. The SDV is denoted as A 0 , and other traffic participants (i.e., HDVs) are denoted as A 1 , . . . , A N . It should be noted that the other traffic participants can be vehicles, bicycles, or pedestrians, which the agent displays. The state of agent i at time t is denoted as s i t , and the corresponding trajectory is τ i t . Apart from the agents, the scenario context, including a vectorized HD map with traffic signals, is also considered. The context is represented as M t . At time t, given the historical state X t = s 0:N t−H:t of the SDV and the HDVs from the previous H steps and with the scene context M t , the goal of trajectory planning is to learn a planner function that generates the future trajectory of the SDV while ensuring that the trajectory meets the requirements of safety and efficiency.

B. Model Framework
In the interaction scenario, the uncertainty of the driving behavior of surrounding agents causes SDV safe planning difficulties. If the SDV intends to plan a safe trajectory, it needs to accurately predict the movements of other vehicles, which are influenced by their drivers' driving styles and driving intentions. Meanwhile, the surrounding vehicle motions will also be influenced by the future trajectory of the SDV. Therefore, the SDV needs to fully consider the environmental uncertainty caused by these factors to ensure safe and efficient planning behavior. To address the difficulty of planning in intersection scenarios with uncertainties and complex social interactions, we propose a two-stage trajectory planning method that contains candidate trajectory generation based on HDV driving style recognition, driving intention judgment, and trajectory selection based on safety verification.
The model architecture is shown in Fig. 2. First, we construct a multimodal joint prediction and planning module considering vehicle driving style during the candidate trajectory generation stage. The module accepts the bird's-eye view semantic graph as input, which includes an HD map and other vehicle agents. The vehicle interaction is encoded through a Transformer. Meanwhile, the driving styles of the surrounding HDVs are explicitly modeled, and the predicted likelihood of the vehicle driving style is used as a conditional constraint to assist the trajectory decoding network in achieving interaction-aware joint prediction and planning. The process produces candidate trajectories for the SDV. During the trajectory selection stage, the safety-sensitive trajectory evaluation function scores each candidate trajectory based on the predicted trajectories of the other vehicles in the first stage as well as other road information. Finally, the trajectory with the safest score is selected for controller execution. We detail the two stages in Sections III-C and III-D.

C. Driving Style-Aware Joint Prediction and Planning
This subsection introduces the proposed joint prediction and planning module considering the HDV driving styles. We first present the inputs and outputs of the module. Then, we design the network structure, and finally, we propose the training method and optimization objectives.

1) Input and Output Representation:
The joint prediction and planning network generates both SDV multimodal trajectory candidates and surrounding HDV multimodal predictive trajectories based on the currently observed vectorized bird's-eye view states. The specific representations of its input and output are as follows: Inspired by recent vectorization representation methods [5], [42], we describe the driving scenario in vector form, and the following data are used as inputs to the joint prediction and planning network: The input contains dynamic entities, static entities, and extra information. The dynamic entities include the SDV's current and past poses, which include the coordinates and yaw angles of the last H steps and the HDV's current and past poses and their types. The static entities include a static HD map with lanes, pedestrian crossings, intersections, and traffic light states. The extra information includes routes that indicate the middle lanes that the SDVs should follow. Each input element is encoded as a reference frame centered on the SDV and contains the element type as an additional feature.
The model's output consists of three components: multimodal trajectory planning, driving style recognition and multimodal trajectory prediction for the surrounding vehicles.
The trajectory planning output consists of K future SDV trajectories and the corresponding probability distributions. Each SDV trajectory is defined as a set of T discrete states where t is the time step in the [1, T ] interval, and x, y, θ denotes the horizontal and vertical coordinates and yaw of the vehicle midpoint relative to the current moment, respectively. These values constitute the vehicle pose. The probability distribution p k = p(τ k |X t , M t ) is defined as the likelihood of the K SDV trajectories given the current state; this likelihood is calculated from the human driving trajectory data and used to model the uncertain driving behavior of human experts in the current state. The computation of the multimodal planning trajectories with the corresponding trajectory probability likelihoods is presented in Section III-C-3. The driving style identification output labels the instantaneous driving styles of the N HDVs in the scene. Each HDV's predicted driving style output is represented as a three-dimensional one-hot encoding, corresponding to radical, moderate, and conservative driving styles.
The model's trajectory prediction output contains the future trajectories of surrounding agents and a probability distribution that is used to evaluate the likelihood that each agent follows the K possible future trajectories. Similar to the planning output, the predicted trajectories of each agent are defined as a discrete set of states Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Each probability distribution q n , n = 1, . . . , N is defined as K possible future trajectories of agent n. This probability distribution is used to select the most likely trajectories.
2) Model Architecture: The architecture of the joint prediction and planning model based on driving styles is shown in Fig. 2. The overall architecture of each module is constructed using neural networks, and the backbone network design is similar to [5]. The main difference in our network lies in the intermediate layer, which recognizes the driving style of the surrounding vehicles. The designed network, which employs end-to-end joint training, ensures that our network effectively models the diversity of driving styles and the intentions of vehicles in the intersection scenario. The model can better characterize the intersection state and help the planner to generate safe and efficient trajectories.
The overall network can be divided into the encoder and the decoder. The encoder maps the SDV-centered scene vector states to hidden embeddings. We apply elementwise point encoders to extract vectorized representations of the intersection state to effectively model the static and dynamic element relationships in the intersection scene. Specifically, we first consider the local relationships of each entity in the scene and encode the state of each vectorized entity (e.g., static map and agents) in the input scene separately. The point features in each entity state are embedded with the same length by multilayer perceptrons (MLPs), while the location information of the points is encoded by sinusoidal encoding. The point set of each entity embedding together with the corresponding location encoding is fed into the 3-layer PointNet [43], which outputs the feature encoding of each entity. The encoding of all entities has the same dimensions. Then, we consider the global relationship between the entities in the scene. Based on the encoding of each entity, we perform global feature aggregation using the DETR architecture [44], which is an encoder-decoder Transformer. Based on the extracted local features of the vehicles via PointNet, the Transformer, which is equipped with self-attention modules, captures the global scene features. The method involves encoding the relationships among all of the input elements (SDVs, road agents, dynamic maps, and lane lines) using the Transformer encoder and querying the characteristics of the SDV and agents using the transformer decoder. The query of the Transformer decoder is formed using a set of learnable embeddings, which are created by adding the SDV embeddings output by the PointNet network to learnable ensembles to generate distinct queries for each of the predicted future SDV trajectories.
The decoder in our architecture contains two modules. The driving style decoder outputs the driving styles of the surrounding HDVs based on scene state encoding. The trajectory decoder jointly outputs the predicted trajectories of the surrounding vehicles with the trajectory candidates planned by the SDV based on the intermediate layer features of the scene state encoding and the driving style decoder. Specifically, the driving style decoder consists of a scaled dot product attention layer and an MLP layer. Its output predicts the probability of three driving styles (radical, moderate, and conservative) for N agents. The planning trajectory decoder and the prediction trajectory decoder share the scaled dot product attention layer and contain separate MLP layers, which are built on top of the shared layer, for prediction and planning. The input to the shared layer contains the merged embedding of the scene state encoding and the intermediate layer output of the driving style decoder. The trajectory decoding network outputs the predicted multimodal trajectories of the agents with the multimodal planning trajectory and the corresponding probabilistic likelihood of each trajectory.
3) Training Process and Objective Function: Inspired by [5], we use imitation learning to train the joint predictive planning network, and the training objective contains three components: the distance between the predicted SDV trajectory and the human expert trajectory, the distance between the predicted agent trajectory and its actual future trajectory, and the distance between the predicted agent driving style and its driving style label. The driving style labels are obtained by clustering in the offline dataset before the training process.
The joint prediction and planning process is represented as an MoE model to forecast multiple trajectories for the SDV and the surrounding agents. Additionally, the probability distribution for each trajectory set corresponding to expert selection is obtained. To avoid the pattern collapse problem during expert training and expert selection, we use a greedy approach in which the maximum probability trajectory for the loss calculation is chosen. We formulate a matching cost between the predicted and target trajectories and probabilities so that the expert with the lowest cost becomes the winner. The matching cost for each SDV trajectory is calculated, and a trajectory is selected via where and p i is the predicted probability of trajectory τ i . L i reg is the trajectory smoothness regularization term. Then, the loss is minimized The objection trades between imitation loss and multimodal loss for trajectory planning.
The network for driving style recognition is trained in an end-to-end manner with the prediction and planning network. The K-means clustering algorithm [2], [46], [47], [48] is used to generate instantaneous driving style labels based on the vehicle trajectories. Driving style recognition consists of three steps: generating trajectory datasets, generating velocity and acceleration datasets from the trajectory datasets, and identifying the driving style of the surrounding human drivers in each scene. Each scene in the training set creates a dataset by capturing the speed and acceleration of the vehicle trajectory and obtains the driving style label through unsupervised K-means clustering. The input to the K-means clustering consists of vector samples representing the speed or acceleration data of each vehicle in each scene. The obtained label of each car in the scene is encoded into a three-dimensional one-hot embedding, representing the cluster's three driving style labels: radical, moderate, and conservative. Our driving style recognition model identifies the driving styles of the surrounding vehicles based on their current and past states, and the objective function is designed as a cross-entropy function of the predicted driving styleŶ i DS and the label Y i where H c denotes the cross-entropy function.

D. Safety-Balanced Trajectory Selection
The candidate trajectories generated by the joint prediction and planning module during the first stage are used as input for the second stage, and the trajectory with the highest safety value is selected for execution.
We account for the generated SDV's multimodal trajectory and the predicted HDVs' trajectories when performing trajectory selection. A collision check is performed on each future SDV trajectory τ i and the predicted most likely trajectory of the HDVs. Safety verification is performed by calculating the overlapping area between the vehicle bounding boxes. The shortest distance between the bounding boxes is denoted as C(τ i ). Then, the distance between the SDVs from the lane median, which is denoted by F (τ i ) , is also used to evaluate the reasonableness of the SDV trajectory. The above two indicators are used to select a safe and reasonable trajectory. For each candidate trajectory, when the distance between the SDV bounding box and that of surrounding vehicles is large and the distance between the SDV and the centerline is small, the planned trajectory aligns with our expected requirements. The final trajectory value function is obtained by The trajectory chosen based on this value function allows the SDV to maintain a safe distance from other vehicles and follow the lane centerline as much as possible. Thus, the approach improves the safety of the SDV during online inference.

A. Dataset
To train and test our model, we use the Lyft motion prediction dataset [49]. This dataset was collected from real vehicles and contains actual driving samples from complex urban routes in Palo Alto, California. The dataset contains a variety of real-world driving scenarios, such as driving in multiple lanes, turning, and vehicle interactions at intersections. The perception system preprocesses the data to obtain the precise location of the surrounding vehicles over time. In addition, the dataset contains an HD map that provides the location and status of lane lines, crosswalks and traffic lights.

B. Baseline
We compare our proposed algorithm with four strong BCbased planner baselines.
1) Behavioral cloning with image-based scene representation (BC-I). BC-I is a standard BC planner based on the ResNet-50 backbone. The planner's input is an egocentered scene raster image without historical information. 2) Behavioral cloning with image-based scene representation and trajectory perturbations (BC-IP). Similar to Chauffer-Net [3], the method adds synthetic perturbations to the ego trajectory in the dataset during training. 3) Behavioral cloning with vector-based scene representation (BC-V). BC-V is a standard BC planner based on the PointNet backbone. The planner's input is the egocentered scene vector, which is the same as the scene representation in this paper. 4) Behavioral cloning with vector-based scene representation and trajectory perturbations (BC-VP). This method adds synthetic perturbations to the ego trajectory for vectorbased scene representation.

C. Evaluation Indicator
We use the following metrics to comprehensively evaluate the planner performance. These metrics reflect imitation performance and safety. It should be noted that the first four metrics are for closed-

D. Experimental Results
We performed closed-loop and open-loop tests to evaluate the model's performance. During the closed-loop test, we validated the safety of the proposed two-stage trajectory planning approach. During the open-loop test, we focused on the distance between the planned and actual human driving trajectories.

1) Closed-Loop Test:
We compare the planning performance of the proposed model with the four BC planning benchmarks through closed-loop tests. The experimental results in 1000 scenarios show that the proposed method has a lower collision rate than all baselines. In the 1000 scenarios, our method resulted in a total of 62 collisions, which is the lowest for all compared planners. This result indicates that the proposed two-stage method is more safety-sensitive during the planning process. In addition, the trajectories planned by our model are more reasonable. Our planner's number of off-road events is the lowest for all compared planners.  We find that perturbation is an important technique for addressing the covariance shift problem [3] in closed-loop tests. Specifically, BC planners with trajectory perturbations, such as BC-IP, BC-VP, and our proposed method, all show improved planning performance in terms of safety and reasonableness compared to BC planners without trajectory perturbations, such as BC-I and BC-V. Our results show that using trajectory perturbation as a data augmentation method during training can significantly reduce the mismatch between the training and inference-time distributions, improving the generalization of the planner in closed-loop tests. This conclusion holds for both image-based and vector-based planners.
To further validate the reasonableness of our proposed planner, we visualize the planning behavior of the proposed method with the benchmark method for a complex interaction scenario. As shown in Fig. 3, our method can cause the SDV in red to stop at the intersection when it displays the red light and resume movement when it displays the green light while maintaining a safe distance from other vehicles and driving along the route. The planning trajectory is close to the blue reference trajectory of human drivers. In contrast, the vehicle based on the baseline BC-VP planner runs the red light and hits other vehicles while not following the routes.

2) Open-Loop Test:
We also evaluated the imitation performance of the proposed planner and BC-VP planners using open-loop tests. The performance metric reflects the closeness of the planning trajectory by the BC planners to the reference trajectory by human drivers for in-distribution scenarios. The experimental results are shown in Table II. Our planner achieves a lower value than the benchmark on both the ADE and FDE metrics, indicating that our method is more capable of imitating expert driving behavior.
3) Ablation Study: To validate the effectiveness of the different modules in the proposed method, we performed ablation study using closed-loop tests. The ablation module includes driving style recognition (DS) and safety verification (SV).
The closed-loop test results are shown in Table III. The imitation and off-road event metrics indicate that, without considering driving style, the planned vehicle trajectory greatly differs from the actual reference trajectory. This result indicates that the driving styles of other drivers partially influence human driving trajectories. Therefore, considering the driving styles of the surrounding vehicles will help to achieve reasonable planning for the SDV.
The safety verification module has also been proven to be effective. The experimental results show that using safety verification can reduce the number of collisions compared to not using safety verification. This result demonstrates that the combination of multimodal trajectory generation and safety verification can maximize safety during the SDV driving process.

V. CONCLUSION
In this paper, we propose a safety-balanced two-stage trajectory planning method that considers the driving styles of other HDVs to address the difficulties of SDV trajectory planning caused by the behavioral uncertainty of surrounding traffic participants in intersection scenarios. The two-stage planning approach consists of multimodal candidate trajectory generation based on driving style and trajectory selection based on safetybalanced verification. During the first stage, the model predicts the future trajectory of the surrounding HDVs based on driving style recognition and intention inference and simultaneously generates multimodal candidate trajectories that conform to the human driving experience of the autonomous vehicle in the interaction scenario. During the second stage, based on the generated candidate trajectories and the predicted HDV trajectories, we verify and score the trajectories with a safety value according to the rules constructed from experience and select the safest driving path for final execution.
In the proposed two-stage approach, the first stage learns the driving style and intention of the HDVs to better predict the driving behavior of HDVs and generates feasible trajectory candidates. The second stage further improves driving safety in complex interaction scenarios through safety verification.
In the future, we plan to integrate the proposed two-stage approach with the cognitive world model to better generalize SDV trajectory planning. The world model [50] can be used to simulate the prediction of various possible traffic events during the driving process, provide the potential to evaluate planning behavior and ensure safe and reasonable trajectory planning in the actual system. In addition, the world model is updated through continuous learning based on empirical data to ensure the cognitive alignment of the world model with the real world. In this way, we will integrate parallel planning with parallel vision [51] and parallel control [52], [56] to guarantee safe and efficient autonomous driving in complex road environments. We think parallel driving and parallel testing can be incorporated as a closed-loop system and promote autonomous driving to the 6S world, which is safe in the physical world, secure in the cyber world, sustainable in the ecological world, sensitive to individual needs, services for all, and smart in parallel intelligent societies [53], [54], [55], [58], [60].