A Two-Layer Risky Driver Recognition Model With Context Awareness

For autonomous vehicles and intelligent connected vehicles, the real-time recognition of risky drivers can play an important role in traffic accident prevention. However, the external environment substantially impacts driving behavior and driving risk and is usually costly to acquire. Existing risky driver recognition models often ignore external environment information or assume this information is given. We propose two hierarchical two-layer context-aware machine learning structures. The first layer can speculate external context, for example, traffic states. The second layer recognizes risky drivers based on the contextual information speculated from the first layer. The German Highway Drone Dataset is used to establish risky driver recognition and traffic state recognition models. Rear-end collision risk and side collision risk are evaluated for each vehicle. Drivers with high collision risk are labeled as risky drivers. By analyzing vehicle trajectory data from three traffic states: free-flow, saturated, and congested, we find that traffic states have a significant influence on vehicle’s longitudinal speed, lateral speed, longitudinal acceleration/deceleration, and collision risk. Six classifiers, including SVM, KNN, RF, Adaboost, Extra trees, and XGBoost, are applied to train recognition models. Results show that the proposed structures can significantly improve model’s ability to recognize risky drivers.


I. INTRODUCTION
New Information and Communication Technologies (ICT) are redefining the automobile industry. Autonomous Vehicles (AV) and Intelligent Connected Vehicles (ICV) provide massive driving behavior and driving environment data and demand a deep understanding of human driving behavior. Dangerous driving behavior is the leading cause of traffic accident in historical data [1]. Identifying an individual's driving style and dangerous driving behavior can enhance AV and ICV's ability to perceive risk from the interaction with surrounding vehicles, adjust driving strategy to driving conditions, and improve road traffic safety.
There is no universally accepted definition of driving style. Sagberg et al. [2] reviewed the definitions of driving style in the literature and summarized that driving style represents a habitual and relatively permanent way of driving that differs across individuals or between groups of individuals. A driver The associate editor coordinating the review of this manuscript and approving it for publication was Michail Makridis . could consciously or subconsciously choose a driving style. By contrast, Martinez et al. [3] stated that driving style is strongly influenced by external driving conditions, like traffic, road type, time in the day, weather, then the same driver could switch driving style under different driving conditions. The driving style studies in most papers could be either relatively permanent or changing with external driving conditions, depending on the measurement methods of the driving style chosen by researchers. For example, aggressive driving is one of the heavily studied driving styles, and it can be measured by self-report questionnaires and observations of driving behavior. Self-report questionnaires [4] reflect drivers' self-assessment of their driving style, and self-assessment is generally stable. Aggressive driving indicators extracted from driving behavior observations include speed profile, hard braking, lateral control, and tailgating [5]. The measured driving style could be influenced by the driving condition under which the driving behavior data was observed. For many studies that rely on driving behavior data collected from naturalistic driving experiments, driving VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ simulators, and vehicle trajectory data, the external driving condition cannot be ignored to identify driving styles accurately. Many external driving conditions were found to influence driving style. Ericsson [6] found that traffic, road type, and driver's demographic factors influence driving patterns, among which road type has the most significant impact. Driving patterns may differ between city road and freeway. Freeways are designed to have continuous traffic flow, and city roads have turning and traffic control infrastructure [7]. Dixit et al. [8] found that drivers tend to behave more conservatively when the pavement was wet, and drivers behave more aggressively during the morning peak period compared to midday and evening peak periods. Therefore, it is necessary to be aware of the external driving conditions as the contextual information for driving style recognition and adapting AV and ICV's driving strategy to suit the specific driving condition.
We classify existing driving style recognition models into three categories, based on the role of contextual information in the model: • Context-Irrelevant: Although contextual information is essential to driving style recognition, most conventional driving style recognition models do not consider contextual information in the system by assuming all observations share the same contextual information [9], [10].
• Context-Dependent: Some models consider certain contextual information that sensors can acquire. For example, Yi et al. [11] proposed a personalized driving state recognition model that each driver has two individual driving state recognition models, one for motorway and another for secondary road. Before predicting the driving state for a new sample, the driver's identity and road type should be given. However, the collection of all contextual information could be difficult and costly in practice. Some contextual information can only be estimated with uncertainty. This drawback limits the application of context-dependent models.
• Context-Aware: A context-aware system can extract and interpret context information that is important to characterize the situation of an entity, including people, place, device, and things [12]. Bejani and Ghatee [13] developed a context-aware system that can recognize the context information, including traffic levels, car types, and driving events, with which the driving style is evaluated using a rule-based fuzzy inference model. Suzdaleva and Nagy [14] proposed a two-layer pointer model that estimates driving conditions, namely urban, rural, and highway, in the inner pointer and estimates driving style in the outer pointer. To the authors' knowledge, there are few studies that establish a context-aware driving style recognition model using machine learning. Applying machine learning recognition model for uncertain contextual information could improve the system's performance on driving style recognition.
Except for context awareness, another challenge of driving style recognition is data labeling. Normal samples and dangerous/aggressive/risky samples can be classified based on illegal driving behavior [15] or accident [16] in naturalistic driving data. Subjective data labeling methods include experts scoring [17] and driver self-assessment questionnaires [18]. These methods are not suitable for large sample size vehicle trajectory data. Xue et al. [19] used collision surrogate measurements, such as Time to Collision (TTC) and Margin to Collision (MTC), to label drivers in dangerous car-following state and normal car-following state. Clustering algorithms [20] or semi-supervised learning [21], [22] are designed for unlabeled or partially labeled data, but the result is often hard to verify. Moreover, vehicle collision risk assessment, which is related to driving style, has been studied using machine learning algorithms [23], [24].
The contributions of this paper are as follows. (1) We propose a risky driver labeling method using vehicle trajectory data based on rear-end collision risk and side collision risk. (2) We analyze vehicle trajectory data in three traffic states: free-flow, saturated, and congested, and confirm their impact on driving behavior and driving risk. (3) We propose two hierarchical two-layer context-aware structures that can recognize both driving style and contextual traffic states. (4) We test the proposed structures' performance to prove their superiority over conventional context-irrelevant machine learning. This paper's remainder is organized as follows: Section II describes the vehicle trajectory data used for modeling. Section III introduces the framework of the risky driver recognition modeling and the methodology of each part in the framework. Section IV presents the results. Section V discusses our findings. Section VI concludes.

II. DATA
The vehicle trajectory data used in this paper is from the Highway Drone Dataset (highD) [25]. HighD contains vehicle trajectory data recorded at six German highways using Unmanned Aerial Vehicles (UAV). The trajectory of 110,500 vehicles was extracted from UAV videos by computer vision algorithms. The UAV camera can cover a 420-meter length of highway with a typical vehicle positioning error of less than 10cm. The vehicle's information, including position, speed, acceleration, lane-changing, and car-following was detected and tracked every 0.04 s and smoothed using Bayesian smoothing.
The data used in this paper contains three files, which were recorded on a 6-lane highway at 8:55 AM, 10:12 AM, and 17:21 PM in a working day, respectively. The trajectory of 7950 vehicles was recorded over 55 minutes 46 seconds. The details of the three files are shown in Table 1.
The traffic state of each file is determined by its volume, mean speed, and density. The design capacity of the 6-lane highway is 13200 passenger car units (PCU) per hour. The volume of the first file is close to the design capacity; therefore, the flow state is saturated. The second file has a traffic volume lower than the design capacity and density higher than the density in the saturated flow; therefore, the flow state of the second file is congested. The third file has traffic volume lower than the design capacity and density lower than the density in the saturated flow; therefore, the flow state of the third file is free-flow.

III. METHOD
We summarize the methodology framework as a flowchart in Figure 1. First, vehicle trajectory data is used to calculate each vehicle's rear-end collision risk and side collision risk, based on which each driver is labeled as risky driver or normal driver. The collision risk evaluation method and driver labeling are introduced in Sections III-A and III-B.
Once the driving style is labeled, we try to establish a risky driver recognition model. Section III-C describes how to extract and select features from the vehicle trajectory data. Since the number of risky drivers is often much smaller than the number of normal drivers, we apply SMOTE (Synthetic Minority Oversampling TEchnique) oversampling to relieve the imbalance issue. The proposed two-layer risky driver recognition structures are introduced in Sections III-D. Single-layer Risky-driver-recognition without context-awareness (SR) is the conventional machine learning structure that does not consider contextual information, which is traffic state in this paper. Two-layer Riskydriver-recognition structure with Context-awareness (TRC) is hierarchical and contains traffic state recognition model and risky driver recognition models. Probability-based Two-layer Risky-driver-recognition structure with Context-awareness (PTRC) is an extension of TRC that prediction result is based on probabilities.
Section III-E covers machine learning classifiers used in recognition models and hyperparameter tuning. Section III-F introduces probability calibration. Section III-G explains how we evaluate model performance and conduct the statistical test.

A. COLLISION RISK EVALUATION
Given the target vehicle and surrounding vehicle's trajectory data, we need to calculate the collision risk of the target vehicle through the whole driving process. We consider rearend collision risk and side collision risk in the measurement of high-risk drivers in this paper. We categorize the target vehicle's moving state into car-following, lane departure, and lane-changing and calculate collision risk based on the following rules: (1) Car-Following: When the gap between the target vehicle and its leading vehicle is smaller than  Figure 2(c)), keep calculating the rear-end collision risk between the target vehicle and its leading vehicle and between the target vehicle and its following vehicle until the target vehicle completes land-changing. The completion of land-changing is determined when the target vehicle's distance to the lane line is greater than 0.5m (shown in Figure 2(d)).

1) REAR-END COLLISION RISK
Rear-end collision risk at time t is calculated using Difference of Space distance and Stopping distance (DSS): where V l and V f are the longitudinal speed of the leading and following vehicles, respectively; VOLUME 9, 2021 µ is the fraction rate, set to 0.7; g is the acceleration of gravity, 9.8 m/s 2 ; d is the longitudinal gap between the leading and following vehicles; τ is the reaction time of driver. Suggested by Green [26], when the following vehicle is accelerating, τ is set to 1.5 s; when the following vehicle is decelerating or idling, τ is set to 0.7 s.
When DSS ≥ 0, it means the following vehicle has enough time to decelerate and avoid a collision. When DSS < 0, the following vehicle does not have enough time to react to the leading vehicle's abrupt deceleration and then has a collision risk.

2) SIDE COLLISION RISK
To evaluate the side collision risk between two vehicles, we need to calculate the lateral location of vehicle's corner that is closest to the adjacent vehicle (denoted as y R ). The direction parallel to the lane line is the x axis, and the direction vertical to the lane line is the y axis. The vehicle's heading direction is determined by its lateral speed V y and longitudinal speed V x . y R can be calculated as follows: where θ(t) is the angle between the heading direction and the x axis; y d (t) is the lateral distance between the corner point and center of vehicle; w and h are the vehicle's length and width, respectively; y c (t) is the lateral location of vehicle's center.
In Figure 2(b), the adjacent vehicle is on the left side of the target vehicle. The side collision risk is evaluated using time-to-collision (TTC) in equation (5).
where V Ay is the lateral speed of the adjacent vehicle; y A is the lateral location of the adjacent vehicle's center.
If the adjacent vehicle is on the right side, the side collision risk should be evaluated using equation (6).
The threshold of TTC has been studied heavily in literature. Brown et al. [27] and Das and Maurya [28] considered that 3 s as the threshold of TTC to warn rear-end collision. Bella and Russo [29] suggested using 2.5 s or 3 s as the threshold for abnormal car-following. Qu et al. [30] and Minderhoud and Bovy [31] also adopted 3 s as the TTC threshold in their traffic safety evaluation research. Therefore, this paper uses 3 s as the threshold for side collision.

B. RISKY DRIVER LABELING
To establish a risky driver recognition model, we need to label all observed vehicles as either normal drivers or risky drivers. The labels are assigned based on the vehicle's overall collision risk and used as the ground truth for machine learning model training. Average collision risk (ACR) can measure the overall collision risk exposed to the target vehicle over the whole trajectory: where T is the observation duration of the target vehicle; t is the time interval of observations; 1 {·} is an indicator function. For example, 1 {DSS(t) < 0} is 1 when DSS(t) < 0 and 0 when DSS(t) ≥ 0.
Risky drivers are labeled based on their ACRs. The ACR threshold is determined using the Interquartile Range (IQR) method [32], which is an outlier detection method proposed by Laurikkala et al. The threshold can be calculated as follows.
where Q 3 and Q 1 are the upper and lower quartile of the non-zero ACR distribution, respectively. IQR method can calculate the threshold of abnormal data under various distributions [33].

C. FEATURE EXTRACTION AND SELECTION
Features are extracted from vehicle's longitudinal speed, lateral speed, and gap using Discrete Fourier Transform (DFT). DFT converts time series of driving parameters to signal amplitude in the frequency domain and has been proved to be an efficient feature extraction method in driving behavior studies [16]. The DFT of a given time series (x 1 , where i is the imaginary unit. The mean, standard deviation, coefficient of variation, and 20 DFT coefficients of longitudinal speed, lateral speed, and gap are extracted as features. There are 72 features in total. Recursive Feature Elimination (RFE) [34] is a feature selection algorithm. First, all 72 features are used in model training. Second, features are ranked based on their contribution to model performance. Third, the least important feature is iteratively eliminated as long as the model's performance can be improved.

D. CONTEXT-AWARENESS STRUCTURE
This Section proposes two context-awareness structures and compares them with the commonly used risky-driverrecognition structure without context-awareness. All context recognition and risky driver recognition models discussed in this Section use the same 72 features extracted from the vehicle trajectory data as the inputs. The final output of all risky driver recognition models is a 0-1 indicator showing whether a given driver is a risky driver.

1) RISKY-DRIVER-RECOGNITION WITHOUT CONTEXT-AWARENESS
The most commonly used Single-layer Risky-driverrecognition (SR) structure is shown in Figure 4. The structure is straightforward. Through feature extraction and selection, vehicle trajectory data is transformed to the input of the risky driver recognition model. The recognition model often utilizes supervised machine learning algorithms, such as decision tree, support vector machine, neural network, and ensemble learning. Given an input, the well-trained recognition model can generate a prediction of driving style.
In the whole process, contextual information (traffic state, for instance) is not involved.

2) RISKY-DRIVER-RECOGNITION WITH CONTEXT-AWARENESS
The first proposed 2-layer context-awareness structure is shown in Figure 5, namely Two-layer Risky-driverrecognition with Context-awareness (TRC). TRC includes K + 1 recognition models, where K is the number of context categories. Figure 5 considers three context categories: free traffic flow, saturated traffic flow, and congested traffic flow; therefore, we have one context recognition model and three risky driver recognition models, one for each context category.
The input features are first used to recognize the context (traffic state in this paper), in which the target vehicle is moving, and then the input features are delivered into the corresponding specialized model to recognize risky drivers VOLUME 9, 2021 based on the predicted context category. For example, if a given vehicle is recognized as moving in the context of congested flow, the risky driver recognition model under congested traffic flow is chosen to predict the driver's driving style.

3) PROBABILITY-BASED RISKY-DRIVER-RECOGNITION WITH CONTEXT-AWARENESS
The second proposed 2-layer context-awareness structure is shown in Figure 6, namely Probability-based Two-layer Risky-driver-recognition with Context-awareness (PTRC). Same as RC, PRC has K + 1 recognition models. However, instead of outputting predicted labels, the K + 1 recognition models generate predicted probabilities for each label. The probability-based 2-layer risky-driver-recognition model can be expressed as follows: where: p (y 1 ) is the predicted probability of being risky driver for a given input; p (c 1 ),p (c 2 ),p (c 3 ) are the predicted probability of context 1, 2, and 3, respectively, reported by the context recognition model; p ( y 1 | c 1 ),p ( y 1 | c 2 ),p ( y 1 | c 3 ) are the predicted probability of being risky driver, reported by the risky driver recognition models under context 1, 2, and 3, respectively.
Ifp (y 1 ) > 0.5, then the predicted driving style for the given input is risky; otherwise, the predicted driving style is normal.

E. CLASSIFIER
The establishment of context recognition model and risky driver recognition models need machine learning classification algorithms. The classifier candidates are described below.

1) SUPPORT VECTOR MACHINE
Support vector machine (SVM) algorithm [35] is to find a hyperplane in a multi-dimensional space that distinctly classifies the data points with the maximum margin. The hyperparameters of SVM are the C parameter and gamma.

2) K-NEAREST NEIGHBORS
K-nearest neighbors (KNN) algorithm [36] is a nonparametric supervised machine learning method that is widely used in classification problems. An instance is assigned to the class most common among its k nearest neighbors. KNN has three important hyperparameters: the number of neighbors, power parameter for the Minkowski distance metric, and weight function used in prediction.

3) AdaBoost
AdaBoost [37], short for Adaptive Boosting, fits a classifier on the original dataset and then increases the weights of incorrectly classified instances such that subsequent classifiers focus more on these instances. The hyperparameters of AdaBoost include the number of estimators and learning rate. the base classifier is set to decision tree with maximum depth equals 1.

4) RANDOM FOREST
Random Forest (RF) is an ensemble learning method consisting of multiple decision trees trained independently on various sub-samples of the dataset [38]. The class prediction with the most votes from decision trees becomes the final prediction of the random forest. the hyperparameters of RF include the number of estimators, maximum depth, the minimum number of samples required to split an internal node, maximum features, etc.

5) EXTRA TREE
Extra trees [39] and random forest are two similar ensemble methods. One difference is that random forest uses bootstrap replicas while extra trees use the whole original sample. Another difference is that RF chooses the optimum split for nodes while extra trees choose it randomly. The hyperparameters of extra trees include the number of estimators, maximum depth, the minimum number of samples required to split an internal node, maximum features, etc.

6) XGBoost
XGBoost, short for eXtreme Gradient Boosting, is a gradient tree boosting-based software that is superior in performance, fast in training time, and has an easy-to-use interface [40]. The hyperparameters of XGBoost include the number of estimators, learning rate, maximum depth, subsample ratio, etc.
We apply hyperparameter tuning in the training of each classifier using HyperOpt. HyperOpt [41] is an open-source Python library for hyperparameter tuning. It allows for the automatic search of the optimal value of hyperparameters mention above by Tree of Parzen Estimators (TPE). It is more powerful than manual search and grid search because of its speediness, stability, and accuracy compared to manual tuning and grid search.

F. PROBABILITY CALIBRATION
All classifiers listed in the last Section can predict class with a probability score for each class. The probability scores are necessary for the PTRC structure. However, these probability scores are usually biased for two reasons: first, SVM and boosted trees are not trained using a probabilistic framework and then product biased probabilities [42]; second, imbalanced data and sampling impacts the predicted probabilities for the majority and minority class instances [43].
There are two common probability calibration methods: Platt scaling [44] and isotonic regression [45], [46]. Platt scaling trains a logistic regression to map the original classifier's output to the true class probability. Isotonic regression is a non-parametric approach that fits a piecewise constant non-decreasing function, where predicted probabilities are monotonically increasing over bins. Platt scaling is applied in this paper since isotonic regression may have an overfitting issue when the calibration data sample size is small.

G. CROSS-VALIDATION AND EVALUATION
This paper uses precision rate, recall rate, F1 score, and Area under the Precision-Recall Curve (AUPRC) to evaluate recognition model's performance.
Precision rate is defined as follows: where TP is the number of risky drivers correctly identified; FP is the number of normal drivers wrongly identified as risky drivers.
Recall rate is defined as follows: where FN is the number of risky drivers wrongly identified as normal drivers. The F1 score is the harmonic average of precision rate and recall rate. A high F1 score represents high values in both precision rate and recall rate.
The precision-recall curve is a plot of the precision rate and the recall rate for different probability thresholds. Area Under Precision-Recall Curve (AUPRC) is more appropriate than Area Under Receiver Operating Characteristic curve (AUROC) to measure the model's performance when the dataset is imbalanced [47].
Machine learning algorithms are commonly evaluated using k-fold cross-validation, and their evaluation metrics, such as mean accuracy scores, are compared directly. Statistical significance tests are designed to test whether the difference between evaluation metrics is statistically significant or the result of a statistical fluke. The null hypothesis is that metric scores observed from two algorithms were drawn from the same distribution. If this assumption is rejected, it suggests that the difference in metric scores is statistically significant. Otherwise, the two algorithms' performances are statistically equal. K -fold cross-validated paired Student's t-test is the most used statistical test for machine learning algorithms comparison. However, the calculation of the t-statistic in the test is misleading since the metric scores in each sample are not independent [48]. In k-fold cross-validation, a given observation will be used in the training dataset k-1 times. This means that the estimated metric scores are dependent.
Dietterich [48] recommended a resampling method called 5 × 2-fold cross-validation that involves five repeats of 2fold cross-validation. Two-fold cross-validation can ensure that each observation appears only in the train or test dataset once. A paired Student's t-test is used on the results.
where: (1) i is the scores difference of two algorithms for the first fold of the i-th 2-fold cross-validation; (2) i is the scores difference of two algorithms for the second fold of the i-th 2-fold cross-validation; µ = (1) 1 + (2) 1 2 is the mean of scores difference for the first 2-fold cross-validation.
Under the null hypothesis that two algorithms are statistically equal, t is assumed to follow a Student's t-distribution with 5 degrees of freedom. If t stays close enough to 0, then the null hypothesis is satisfied. The threshold is 2.571 at the 95% confidence level. 5 × 2 cross-validation is used in this paper to compare algorithms' performance.
We use stratified 5 × 2-fold cross-validation to evaluate the classification algorithm's performance. The average AUPRC, F1 score, precision rate, and recall rate are used to compare the performance of models. To find the optimal hyperparameters of each classifier, we iterate the hyperparameter optimization algorithm 500 times. The hyperparameter values that can reach the highest average AUPRC are the optimal hyperparameters.

IV. RESULTS
The computation platform is a laptop with an AMD Ryzen 7 8-core CPU. Analysis of collision risk, risky driver labeling, and driving behavior parameters was done on Matlab 2018b. All machine learning work was done using Python with supported libraries, including pandas, numpy, xgboost, imblearn, hyperopt, and scikit-learn.
First, we analyze the labeling result in Section IV-A; second, we compare the driving parameters in three traffic states and justify the necessity of recognizing traffic states.

A. COLLISION RISK AND RISKY DRIVERS
Using the method introduced in Section III-A, we calculate the ACR of drivers and plot the distributions in Figures 7-9. Only samples satisfying the following rules are kept in the dataset: • The vehicle type is private car. • The vehicle's total duration in car-following state is longer than 10 s. Figure 7, in free flow, more than 50% of drivers have a zero or near-zero ACR value indicating they are driving at a safe state; 20% of drivers have an ACR between 1.25 s and 2.5 s. About 10 % of drivers have an ACR greater than 5 s.

As shown in
As shown in Figure 8, in saturated flow, 22% of drivers have a zero or near-zero ACR, while the percentage of drivers with ACR between 1.25 s and 2.5 s rises to 31%. About 25 % of drivers have an ACR greater than 5 s.
As shown in Figure 9, in congested flow, 22% of drivers have a zero or near-zero ACR, while the percentage of drivers with ACR between 1.25 s and 2.5 s rises to 45%. About 20 % of drivers have an ACR greater than 5 s.   Although the distributions of ACR vary across different traffic states, we want to determine universal criteria for all traffic states in order to label risky drivers. Based on the IQR method introduced in Section III-B, we calculate the upper and lower quartile of all ACRs and find ACR = 0.5 as the universal risky driver threshold for three traffic states. Drivers with ACR > 0.5 are labeled as risky drivers. The data information after labeling is shown in Table 2. There are 99 risky drivers in the free flow file, 155 risky drivers in the saturated flow file, and 86 risky drivers in the congested flow file. The saturated flow file has the highest percentage of risky drivers, but still, the data is highly imbalanced, and the imbalance ratio is 10.1:1. The imbalance ratio of the free flow file is 14.4:1, and the imbalance ratio of the congested flow file is 22.3:1.
We find that drivers tend to drive more safely in congested flow due to the congestion. For example, the average lane change per vehicle decreases as traffic volume increases. The percentage of risky drivers in congested flow is the lowest among all three flow states.

B. DRIVING BEHAVIOR PARAMETERS IN THREE TRAFFIC STATES
We analyzed the differences in driving risk of three traffic states in the last Section. This Section compares the distributions of individual's four driving behavior parameters in three traffic states. The individual's driving behavior parameters include each vehicle's longitudinal speed, lateral speed, longitudinal acceleration, and longitudinal deceleration.

1) LONGITUDINAL SPEED
Unsurprisingly, the free-flow has a relatively higher mean longitudinal speed than congested flow and saturated flow. It is noteworthy that many drivers in the free-flow have a mean longitudinal speed higher than the speed limit, which is 33.33 m/s, or 120 km/s. In congested flow, more than 56% of drivers have a mean longitudinal speed lower than 25 m/s. The distribution of mean longitudinal speed in the saturated flow is between that in the free flow and congested flow. The free-flow has the lowest longitudinal speed fluctuation. About 38% of drivers in free-flow have a near-zero longitudinal speed fluctuation. The fluctuation in the saturated flow is higher than that in the free flow, while the congested flow has the highest fluctuation.

2) LATERAL SPEED
Due to congestion and fewer lane changes, more than 90% of drivers in the congested flow have a mean lateral speed close to zero, while this percentage in the saturated flow and free flow is about 80% and 65%, respectively.
Although drivers in congested flow have the highest fluctuation in longitudinal speed, they have a more stable lateral speed than drivers in free flow and saturated flow. The distributions of the standard deviation of lateral speed in free flow and saturated flow are similar.

3) LONGITUDINAL ACCELERATION AND DECELERATION
Since the drivers' longitudinal speed in free flow is more stable than those in saturated flow and congested flow, their distributions of mean longitudinal acceleration and deceleration are more concentrated around zero. By contrast, the distributions of mean longitudinal acceleration and deceleration in congested flow are more diverse.
In summary, we find that congested traffic flow has more drivers with unstable longitudinal speed, acceleration, and deceleration. However, as shown in Table 2, the percentage of risky drivers in congested flow is lower than the other two flow states. It is inconsistent with common sense that speed instability and abrupt acceleration/deceleration are indicators of risky driving. This contradiction implies that drivers should not be compared directly based on their driving parameters across different traffic states. In order to justify risky drivers, traffic state should be clarified first, and then drivers in the same context (traffic state) are compared to recognize risky drivers.

C. MODEL RESULTS
This Section aims to test the performance of three riskydriver-recognition structures: SR, TRC, and PTRC. SR is the baseline structure, and TRC and PTRC are our proposed structures. SR has only one recognition model: the risky    Table 3). TRC and PTRC have four recognition models: one context recognition model (results are shown in Table 4) and three risky driver recognition models (results are shown in Tables 5-7), one for each traffic state.
For each model, we compare the performance of six classifiers: SVM, KNN, Adaboost, RF, Extra tree, and XGBoost. Table 3 shows that XGBoost is the best classifier for risky driver recognition model in the SR structure with the highest AUPRC, 0.781, and the highest F1, 0.730. SVM and KNN classifiers are much worse than ensemble learning algorithms, including Adaboost, RF, Extra tree, and XGBoost.
For the traffic state recognition model, XGBoost is the best choice. As shown in Table 4, XGBoost reaches the highest AUPRC, 0.804, the highest precision, 0.805, the highest recall, 0,804, and the highest F1, 0.804, among all classifiers. The second-best classifier is RF, with the second-highest AUPRC, precision, recall, and F1.
For the risky driver recognition model in each traffic state, XGBoost is still the best classifier. As shown in Tables 5-7 Comparing the XGBoost results from Table 3 with those  from Table 5-7, we find that awareness of traffic state can increase model's risky driver recognition ability. For example, without knowing the driving context (traffic state), the precision and recall of the risky driver recognition model are 0.763 and 0.715, respectively. Knowing that a driver is in saturated flow, the precision and recall of risky driver  recognition model are 0.834 and 0.730, respectively; knowing that a driver is in congested flow, the precision and recall of risky driver recognition model are 0.800 and 0.783, respectively; knowing that a driver is in free flow, the precision and recall of risky driver recognition model are 0.813 and 0.698, respectively. All three traffic state-specific risky driver recognition models have higher precision and recall than the general risky driver recognition model, except the model in free flow, which has a slightly lower recall score than the general model.
Using XGBoost as the classifier for all recognition models in SR, TRC, and PTRC, we list the final risky driver recognition evaluation results in Table 8. The results for SR are the same as the results for XGBoost in Table 3 since the SR is a single-layer structure and has only one recognition model. The results for TRC and PTRC are the combination of context (traffic state) recognition and risky driver recognition. Given a vehicle without knowing its context, TRC and PTRC need to predict its context and corresponding probabilities and then using traffic state-specific risky driver recognition models to predict driving style. PTRC (calibrated) applies Platt scaling to calibrate predicted probabilities of XGBoost, instead of using uncalibrated probabilities directly. Table 8 shows that TRC, PTRC, and PTRC (calibrated) all outperform SR according to AUPRC and F1 scores. We conduct Student's t-test to see whether the difference between SR and other structures is statistically significant. TRC, PTRC, and PTRC (calibrated) all pass the Student's t-test, which means the two-layer structure is significantly better than the commonly used single-layer structure in the field of risky driver recognition. PTRC (calibrated) has the highest AUPRC, 0.820, and the highest F1, 0.762.

V. DISCUSSION
The impact of traffic state on driving behavior is evident. As traffic congestion increases, the vehicle's longitudinal speed becomes unstable, lateral speed's variation decreases, abrupt acceleration/deceleration occurs more frequently. Moreover, we find that saturated flow has the highest percentage of risky drivers and congested flow has the lowest based on collision risk evaluation. Therefore, we believe that traffic state is an essential factor in risky driver recognition, and the relationship between traffic state and collision risk is complicated that congestion does not always cause traffic risk.
Our goal in this paper is to test whether we can increase machine learning's ability of risky driver recognition by using novel two-layer structures without introducing any new features. Since the input features of machine learning have no traffic state information, the two-layer structures need to predict traffic state and recognize risky drivers. There are two main advantages of two-layer structure compared to singlelayer structure: • More Flexibility in Machine Learning Models: Twolayer structures have K + 1 models, and each recognition model can apply different classifiers and different hyperparameter values to reach the highest performance possible. By contrast, a single-layer structure has only one model and can only apply only one classifier with one hyperparameter setting.
• Adaptable to New Data: Traffic states and driving behavior is complex. Different locations could have different compositions in traffic states and driving styles. The data we applied in this paper have 1529 vehicles in free flow, 1720 vehicles in saturated flow, and 2004 vehicles in congested flow. The percentage of risky drivers in each traffic state is also different. For example, a heavily used highway is always congested, and its driving risk is different from a highway that is congested only at peak hours. The single-layer structure requires a retraining of the whole risky driver recognition model, while the two-layer structure only needs to retrain the risky driver recognition model for congested flow.
The two-layer structure's disadvantage is the relatively high computational cost. SR structure only needs to train one risky driver recognition model, while TRC and PTRC need to train one traffic state recognition model and three traffic state-specific risky driver recognition models. Since the training of traffic state-specific models involves a subsample of the data, the training time of each state-specific model is much less than that of the model in the SR structure. VOLUME 9, 2021 Table 9 lists the computational cost of each structure through 10-fold cross-validation. The computation time of SR is 8.5 s, and the computation time of TRC and PTRC are both 19.0 s. The PTRC (calibrated) takes 19.1 s since it needs a little extra time for probability calibration. Although the two-layer structures' computational cost is much higher than the single-layer structure, it may not be a major concern in practice, thanks to the growing computation power of electronic devices. Moreover, parallel computation can help to reduce the computational cost of two-layer structures since the K + 1 models can be trained independently.

VI. CONCLUSION
This paper proposes two risky driver recognition structures: Two-layer Risky-driver-recognition with Context-awareness (TRC) and Probability-based Two-layer Risky-driverrecognition with Context-awareness (PTRC). Using vehicle trajectory data, we are able to demonstrate the impact of traffic state on driving behavior parameters and risky driving. Therefore, as an essential type of context information, traffic state should be considered in the risky driver recognition model. We confirm that TRC and PTRC are significantly better than the single-layer structure by achieving higher AUPRC and F1 scores through the statistical test. PTRC's performance can be further increased after probability calibration.
This research has a few limitations. The performance improvement of proposed structures is moderate. Future improvement could be achieved by considering more traffic states. However, the optimal number of traffic states and traffic states classification problems are not discussed in this paper. The performance evaluation of the proposed structure uses a single source dataset recorded on a German highway. To prove the value of TRC and PTRC, we need more performance tests with new datasets.
For future research, more detailed context information should be considered, such as road type, weather, and vehicle type. Driving behavior data from other sources, such as naturalistic driving data and driving simulator data, are valuable to further test the performance of proposed structure. He is currently a Full Professor with the College of Transportation Engineering, Tongji University. He has engaged in teaching and research in transportation safety, transportation management and planning, intelligent transport systems, systems engineering, and transportation disaster management for more than 30 years. He has completed over 100 projects funded by U.S. federal and state governments as well as China transportation agencies at national and provincial levels. He has published 15 books and over 200 academic articles in high impact international journals. His research interests include traffic safety, transportation planning and management, intelligent traffic systems, and traffic detection technology. VOLUME 9, 2021