Comparing Factors Affecting Injury Severity of Passenger Car and Truck Drivers

This study aims to explore factors affecting passenger car and truck driver injury severity in passenger car-truck crashes. Police-reported crash data from 2007 to 2017 in Canada are collected. Two-vehicle crashes involving one truck and one passenger car are extracted for modeling. Different injury severities are not equally represented. To address the data imbalance issue, this study applies four different data imbalance treatment approaches, including over-sampling, under-sampling, a hybrid method, and a cost-sensitive learning method. To test the performances of different classifiers, five classification models are used, including multinomial logistic regression, Naive Bayes, Classification and Regression Tree, support vector machine, and eXtreme Gradient Boosting (XGBoost). In both the passenger car driver and truck driver injury severity analysis, XGBoost combined with cost-sensitive learning generates the best results in terms of G-mean, area under the curve, and overall accuracy. Additionally, the Shapley Additive Explanations (SHAP) approach is adopted to interpret the result of the best-performing model. Most of the explanatory variables have similar effects on passenger car and truck driver fatality risks. Nevertheless, six variables exhibit opposite effects, including the age of the passenger car driver, crash hour, the passenger car age, road surface condition, weather condition and the truck age. Results of this study could provide some valuable insights for improving truck traffic safety. For instance, properly installing traffic control devices could be an effective way to reduce fatality risks in passenger car-truck crashes. Besides, passenger car drivers should be extremely cautious when driving between midnight to 6 am on truck corridors.


I. INTRODUCTION
As the dominant mode of freight transportation in North American, trucking plays an important role in commodity flow and economic vitality. According to the 2017 Commodity Flow Survey conducted by the U.S. Department of Transportation, trucks move 73.0% of all goods by value, 71.5% by weight, and 41.6% by ton-miles [1]. Unfortunately, the considerable volume of truck traffic has also brought some regrettable safety issues. Compared to other types of vehicles, trucks have some unique characteristics, such as The associate editor coordinating the review of this manuscript and approving it for publication was Rashid Mehmood . heavier gross weight, larger vehicle size, and larger blind spot area, which might increase the risk of severe crashes. According to the National Highway Traffic Safety Administration (NHTSA), there were 4,761 people killed in crashes involving large trucks in 2017, a 12% increase from 2008 [2]. It should be noted that 72% of these fatalities were occupants of other vehicles. Additionally, the involvement rate of large trucks in injury crashes was 31 per 100 million large-truck miles traveled in 2015, a 48% increase from 2008.
Compared to crashes involving other types of vehicles, truck crashes usually result in more severe economic losses and crash severity. As such, a significant amount of research has been conducted to explore factors affecting injury severity VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ in truck-involved crashes. However, a review of the literature indicates that comparatively few studies have compared factors affecting injury severity of passenger car drivers and truck drivers in truck-involved crashes. Since the crash outcomes of passenger car drivers and truck drivers are significantly different, lacking such information could affect the effectiveness of safety improvement countermeasures. Besides, although a wide variety of modeling approaches have been adopted to study injury severity of truck-involved crashes, relatively little attention has been paid to the data imbalance issue. A dataset is considered as imbalanced when one class has a much greater number of instances than the other classes [3]. In a typical traffic crash dataset, the number of fatal crashes (minority instances) is considerably outnumbered by non-fatal crashes (majority instances), leading to a data imbalance problem. Without proper treatments, data imbalance could severely undermine the performance of classification models. This is mainly due to standard classifiers (such as logistic regression, decision tree and Naive Bayes) are designed for balanced training data. When the data imbalance is present, these classifiers often provide suboptimal results by classifying majority instances more accurately while misclassifying the minority instances [4]. Another reason is the learning process of standard classifiers is guided by achieving the highest overall accuracy, inducing a bias towards the majority instances [5]. The value of a crash classification model depends largely on its accuracy in predicting more severe crashes, which happen to be minority instances in a crash dataset [6]. As such, to make crash classification models more informative, the data imbalance problem needs to be properly handled.
Another issue worth studying is the model results interpretability. In recent years, various machine learning techniques have been used to study traffic injury severities, such as classification and regression tree [7], support vector machine [8], and gradient boosting model [9]. Compared to traditional safety models, these more sophisticated data-driven models have been shown to predict crash severity with relatively high accuracy [10]. Nevertheless, these models are often considered as ''black-box'' methods as lacking the inference ability. To unravel how specific variable influences model prediction results, the current study adopts a recently proposed approach called SHAP (Shapley Additive Explanations), which is a unified approach for interpreting the output of any machine learning model [11]. Based on coalitional game theory, SHAP is able to explain a prediction by computing the contribution of each variable to the prediction.
The rest of this article is organized as follows: Section 2 reviews relevant literature in related domain; Section 3 introduces the dataset used in this study, and the descriptive statistics of variables are provided; Section 4 describes the proposed methodology in detail; Section 5 discusses the model results; Section 6 concludes the current study and Section 7 points out the study limitations.

II. LITERATURE REVIEW
Since the available traffic crash datasets typically report injury levels as discrete variables, many previous studies on injury severities of truck-involved crashes have adopted discrete outcome regression models. In an early study, Khattak and Targa applied the ordered probit model to examine factors affecting truck-involved crash severities in work zones [12]. Based on a unique dataset collected from North Carolina, the effects of various variables were tested. The results suggested that following variables significantly affected the severities of multivehicle crashes involving trucks: the roadway configuration, posted speed limits, adjacent to the work zone, and whether a bypass was required on the opposite side. To study the impact of vehicle, driver, occupant and environmental attributes on injury levels of crashes involving heavy-duty trucks, Lemp et al. established an ordered probit model based on datasets consolidated from various data sources [13]. Results suggested that increasing the number of trailers could increase the likelihood of more severe crashes. Chu used a binary logit model to study factors contributing to severities of crashes involving gravel trucks [14]. This study found that lacking driver awareness, geometric improvement of roadways, and the desire to make more runs in a day significantly increased the likelihood of severe injury crashes. Choi et al. applied a binary logistic regression model to identify factors affecting truck-involved crash severities under normal and adverse weather conditions [15]. Based on the model results, speed-related variables were identified as the most important factors affecting crash severities. To explore factors affecting the frequency and severity of large-truck involved crashes, Dong et al. [16] proposed multinomial logit and negative binomial models. This study concluded that truck percentage, annual average daily traffic, weather condition and driver condition significantly affected both the severity and frequency of crashes involving large trucks. Using 2009-2013 crash data in Ohio, Uddin and Huynh [17] developed six mixed logit models considering three lighting conditions and two area types to investigate factors affecting injury severities of truck-involved crashes. Results revealed the impacts of variables on injury severity were quite different in different models, highlighting the necessity of investigating crashes based on different lighting conditions and area types. Newnam et al. [18] studied a unique safety issue: whether older truck drivers give rise to an increased safety risk. Chi-square statistics were used to explore differences in injury levels in middle-aged and older driver groups. Based on the results of this study, compared with middle-aged drivers, older drivers presented some safer driving behaviors. Moomen et al. [19] utilized a logistic regression model to analyze factors affecting truck crashes on Wyoming downgrades. Several countermeasures were identified to prevent such crashes. Osman et al. [20] analyzed injury severity of large truck crashes in work zones by using a generalized ordered response logit model. It was concluded that following factors had higher elasticity: lower AADT, higher speed limits, and daytime. Useche et al. [21] examined the effect of various factors associated with serious injuries and fatalities among Spanish professional drivers. Results of the study indicated that the type of road and crash, light and vehicle conditions, along with individual driver's characteristics are significant factors for predicting serious injuries and fatalities of professional drivers. Unlike previous literature focusing on truck crashes in developed countries, Wang and Prato [22] analyzed injury severities of truck crashes on mountainous expressways in China. A total of 2,695 truck crashes occurring on four mountain expressways were analyzed with a partial proportional odds model. This study focused on the geometric characteristics of expressways and proposed several road design suggestions to alleviate truck crash severity. Rahimi et al. [23] studied the injury severity of single-vehicle truck crashes in Iran. A random thresholds random parameter hierarchical ordered probit model was used to consider the heterogeneity across crashes. Several safety countermeasures were also proposed. A recent study conducted by Behnood and Mannering [24] studied the temporal instability of factors affecting injury severities of truck-involved crashes. Based on the results of random parameters logit models, this study found that the effects of factors influencing injury severities in truck crashes were unstable from year to year and across daily time periods. Behnood and AI-Bdairi [25] analyzed the weekly instability of factors affecting injury severities in large truck crashes. It was revealed that model estimation results were not transferable across weekends and weekdays. Haq et al. [26] investigated occupant injury severity of truck-related crashes based on vehicle types. It was found that sperate models should be used for each occupant of each vehicle type. Besides, the actions of drivers had more significant impacts on crash severity. Although most previous studies have applied regression models to study injury severity of truck crashes, the application of nonparametric machine learning techniques has also attracted some attention. For instance, Chang and Chien [7] developed a classification and regression tree (CART) model to uncover the relationship between truck crash severities and various driver, roadway, environment and crash characteristics. The results revealed that the following variables were the key determinants of truck crashes severities: seatbelt use, crash type, vehicle type, driver action, crash location and number of vehicles involved in the crash. In another study, a more advanced gradient boosting model was developed to analyze commercial truck crash severities [9]. The model revealed that 22 variables significantly contributed to injury severities and 11 of them could explain more than 80% of the model forecasting.
Regarding the data imbalance embedded in traffic crash datasets, several studies have proposed corresponding treatments. To analyze factors affecting crash severity in Jordan, Mujalli et al. [27] used three different resampling techniques to address the data imbalance issue. It was found that using the balanced data set to train the classifier could improve the classification accuracy of killed and severe injuries crashes.
In another study, Goh et al. [28] applied logistic regression and six popular machine learning algorithms to uncover the relationship between different cognitive factors and unsafe behaviors. Since the unsafe behaviors are highly imbalanced, this study used an over-sampling technique to rebalance the training data. It was concluded that the decision tree algorithm achieved the best classification performance when training on the rebalanced dataset. Jeong et al. [6] proposed a hybrid approach for imbalanced traffic crash data analysis. They used two resampling techniques and five classification algorithms to classify injury severities in motor vehicle crashes. It was revealed that the best classification performance was achieved when Bootstrap aggregation was used with the decision tree, with over-sampling technique to treat data imbalance.
The current study proposes a threefold contribution to existing literature. Firstly, by comparing factors affecting injury severity of passenger car drivers and truck drivers in truck-involved crashes, this article provides some valuable insights for stakeholders to alleviate crash severity. Secondly, by implementing several algorithms to deal with imbalanced crash datasets, the current study significantly improves the classification accuracy of more severe crashes. Thirdly, to the best of our knowledge, this is the first study to apply SHAP to improve the interpretability of traffic crash classification models.

III. DATA PREPARATION
The data used in the current study is extracted from Canada National Collison Database (NCDB). NCDB contains all police-reported vehicle crashes on public roads in Canada since 1999 [29]. For the modeling purpose, this study collects the 2007-2017 two-vehicle crashes involving one truck and one passenger car. According to NCDB, a truck is defined as a heavy vehicle with GVWR (Gross Vehicle Weight Rating) of more than 4,536 kg. From 2007 to 2017, there are 28,605 twovehicle crashes involving one truck and one passenger car. Among these crashes, 1,274 passenger car drivers suffer from fatal injuries, accounting for 4.45% of total crashes. Contrastingly, 27 truck drivers are killed in these crashes, accounting for 0.09% of total crashes. Additionally, 80.46% of passenger car drivers are injured and only 15.36% of truck drivers suffer from injuries in these crashes In the context of the current study, 16 variables are selected for the modeling purpose, including crash characteristics (e.g., crash month and day), infrastructure characteristics (e.g., roadway alignment and surface condition), vehicle characteristics (e.g., age of the passenger car and truck), and driver characteristics (e.g., driver's gender and age). Please refer to Table 1 for the detailed variable description and distribution.

A. TREATING DATA IMBALANCE
As shown in Table 1, the injury levels of passenger car drivers and truck drivers are both highly imbalanced. Without proper treatment, this could severely compromise the performance of the classifier. In the past few years, hundreds of algorithms have been proposed to address the data imbalance issue. Basically, these techniques can be divided into two groups: resampling and cost-sensitive learning [3]. To compare the performance of different data imbalance treatment approaches, the current study adopts three resampling algorithms (over-sampling, under-sampling, and a hybrid method combining under-sampling and over-sampling) and one costsensitive learning method.

1) OVER-SAMPLING
Over-sampling aims to eliminate the adverse impact of skewed class distribution by creating synthetic minority instances. A popular over-sampling technique called SMOTE (Synthetic Minority Over-sampling Technique) is used in the current study. Originally proposed by Chawla et al. [30], SMOTE is widely used in previous imbalanced learning studies [3], [27], [28], [31]. SMOTE aims to create a more balanced dataset by randomly generating artificial minority samples along the line segments joining each minority sample with its k nearest neighbors (in our case, k = 5). Depending on the amount of over-sampling instances required, neighbors from the k nearest neighbors are randomly chosen and one synthetic sample is created in each direction. This is done as follows. Firstly, SMOTE measures the difference between the feature vector (an n-dimensional vector representing the sample) under consideration and its nearest neighbor. Secondly, this measured difference is multiplied by a random number between 0 and 1, which is then added to the feature vector. This forces the selection of a random point and creates an artificial instance along the line segment joining two feature vectors.

2) UNDER-SAMPLING
Unlike over-sampling, under-sampling tries to create betterdefined class clusters by removing samples according to a specific selection criterion. The current study applies the Edited Nearest Neighbor (ENN) method to perform undersampling [32]. For each sample in the dataset, its three nearest neighbors are located. If this sample pertains to the minority class, and at least two of its three nearest neighbors belong to the majority class, then this sample is eliminated. Likewise, if this sample belongs to the majority class, and at least two of its three nearest neighbors pertain to the minority class, then this sample is also deleted. Through this in-depth data cleaning, the ENN method could generate a more balanced class distribution.

3) THE HYBRID METHOD
The hybrid method combines the over-and under-sampling method. Although the SMOTE method could generate synthetic samples by interpolating new points between existing feature vectors, it can also bring on other problems. As shown in Figure 1(b), the interpolation of minority samples could generate artificial samples too deeply in the majority class cluster. To this end, the classification algorithm might be overfitted and less informative. The hybrid method tries to solve this problem by applying the SMOTE and ENN methods in sequence. The SMOTE method is firstly applied to generate artificial minority instances, resulting in a more balanced dataset. Then, the ENN method is called for the data cleaning purpose. This would create better-defined class clusters. In the current study, the SMOTE method, ENN method and hybrid method are all coded in Python based on the imbalanced-learn library [33].

4) COST-SENSITIVE LEARNING METHOD
In addition to the aforementioned three resampling methods, this study also tests a cost-sensitive learning method. Costsensitive learning assumes a higher cost for misclassifying minority instances with respect to majority instances. To this end, a weight is calculated for each sample in the dataset according to the class frequency of this sample. For a sample S i belonging to class i, the weight is calculated as: the total number of samples number of classes × number of class i samples (1) Obviously, minority instances have higher weights relative to majority instances. And this would force the classifier to put more emphasis on correctly classifying minority instances. Unlike data resampling methods which are incorporated at the data level, the cost-sensitive learning method is incorporated at the algorithmic level by modifying the loss function. Compared to data resampling methods, the costsensitive learning method is more computationally efficient, which makes it more suitable for large-size datasets. The current paper adopts the scikit-learn Python library to computer each sample's weight [34].

B. CLASSIFICATION MODELS
To compare factors contributing to crash severity of passenger car and truck drivers, the crash outcomes of both drivers are modelled separately. To examine the performances of different classifiers on predicting crash severity, this study uses five classification models, including multinomial logistic regression, Naive Bayes, Classification and Regression Tree (CART), support vector machine, and eXtreme Gradient Boosting (XGBoost). This section briefly elaborates on each model, as well as the classification performance evaluation metrics.

1) MULTINOMIAL LOGISTIC REGRESSION
In the current study, the crash severity is divided into three categories: no injury, injury, and fatality. As a traditional unordered discrete outcome model, the multinomial logistic regression (MNL) model is suitable for exploring the potential relationship between contributing factors and three or more injury outcomes. Besides, the MNL model does not impost sometimes unrealistic restrictions on parameters, such as normality or homoscedasticity, which makes it a popular choice in crash severity analysis. The MNL works by selecting one injury outcome as the base category and the other injury outcomes are estimated relative to this base category. A standard MNL model is expressed as [35]: where β i is the estimated coefficients for the injury outcome i, and X in stands for independent variables which impact the injury outcome i sustained by crash n. I represents a set of possible injury outcomes.

VOLUME 8, 2020
Based on the results of the MNL model, the impact of each variable on the injury outcome can be easily interpreted by the estimated coefficient or the odds ratio (exponent of the coefficient). Nevertheless, the MNL model does require careful consideration of the correlation between each crash contributing factor and the crash outcome, as well as the possible multicollinearity among contributing factors.

2) NAIVE BAYES
A Naive Bayes (NB) classifier is a popular supervised learning algorithm based on the Bayes theorem. It is called ''Naive'' because a NB classifier has a strong assumption of conditional independence between each pair of explanatory variables, given the class variable value. In other words, a NB classifier assumes that each explanatory variable contributes independently and equally to the class variable. Based on the Bayes theorem, the probability of class variable Y = y given that the explanatory variable X = (x 1 , x 2 , . . . , x n ) can be describes as: Then, the Maximum A Posteriori (MAP) probability could be used to estimate P(Y = y) and P(x i |Y = y). It should be noted that NB classifiers are a set of classification algorithms but not a single classifier. Different NB classifiers are mainly different due to the assumptions regarding the distribution of P(x i |Y = y) . The current paper adopts a Gaussian NB algorithm which assumes that P(x i |Y = y) follows the Gaussian distribution.

3) CART MODEL
The Classification and Regression Tree (CART) model is one of the most popular machine learning models, which has been widely used in traffic safety analysis [7], [31], [36], [37]. Compared with most regression models, the CART model does not impose any predefined relationship between explanatory variables and the class variable. As indicated by the model name, the CART model could handle both classification and regression tasks depending on the nature of the target variable. In the current study, the target variable is the injury severity of drivers, which is a discrete variable. Hence, a classification tree is developed. The CART modeling procedure includes two major steps: tree growing and tree pruning. Starting at the root node, tree growing aims to recursively partition the class variable to minimize the impurity of two child nodes. To this end, during each step, the CART model needs to select an explanatory variable as the splitter which can improve the purity of two child nodes most significantly. There are several indicators to measure the purity improvement, of which the Gini index is most commonly used. And this study selects the Gini index to measure the impurity of any child node. The tree keeps growing by recursively partitioning the class variable based on the Gini index. At some point, all samples within each child node belong to the same class and a saturated tree is generated. This saturated tree is most probably overfitting and could lead to high misclassification rate when classifying a new dataset. As such, this saturated tree should be pruned by adjusting parameters which control the tree growing, such as the maximum depth of the tree, the maximum number of leaf nodes in a tree, and the minimum number of samples required to be at a leaf node.

4) SUPPORT VECTOR MACHINE
The support vector machine (SVM) model is a widely used non-parametric machine learning model of the recent years, mostly because of its sound theoretical foundation and superior predictive performance. Originally proposed by Cortes and Vapnik [38], the SVM model is based on the structural risk minimization principle and the statistical learning theory. Similar to the CART model, the SVM model can also handle both classification and regression problems. For classification problems, the SVM model can map the input vector into a high dimensional feature space. Generally speaking, many hyperplanes can separate the data into different groups in the feature space. The purpose of the SVM model is to construct an optimal hyperplane which can maximize the margin between these groups. The optimal hyperplane is known as the maximum-margin hyperplane, and it can be represented by quadratic optimization modeling.
Although the SVM model was originally designed for twocategory classification problems, it can be extended for dealing with multi-category classification problems after some modifications. In the current paper, the prevailing one-versusone approach is used. For a classification problem with N classes, the one-versus-one approach trains N(N−1)/2 binary SVM models for all possible pairs of classes. Each binary model may predict one class label and the label with the most predictions or votes is determined as the severity level of the crash.

5) XGBoost MODEL
XGBoost stands for eXtreme Gradient Boosting. Originally proposed by Chen and Guestrin [39], XGBoost has been widely used in various machine learning competitions to achieve state-of-the-art results. XGBoost is a scalable tree-boosting system with the purpose of achieving extreme execution speed and model performance. As an advanced implementation of gradient boosting machines, XGBoost is also an ensemble tree method that aims to create a strong classifier based on a series of weak learners. The most commonly used weak learners are CARTs. A single CART might fail to incorporate predictive power from multiple feature space regions, which is why it is called a weak learner. In contrast, by iteratively training a set of weak classifiers, ensemble methods have been proven to be much more accurate than a single classifier [40]. The objective function of XGBoost consists of training loss and regularization term, which can be written as: where θ stands for model parameters which need to be learned from the training data;ŷ i is model prediction for the i th data sample; y i is the actual label of the i th data sample; l is the loss function of the i th data sample, measuring how well the model can fit the training data; (f k ) stands for the regularization term, which is used to control the complexity of the model and avoid overfitting; f k is a scoring function to estimate the output in the k th tree, and t is the total number of trees. Training all CARTs at once is very difficult. Instead, XGBoost adopts an additive training strategy. At training step t, the model predictionŷ t i is the summation of the prediction at step t−1 and the score of a new tree, which can be written as:ŷ To this end, the objective function at step t is: (6) The model parameter θ is updated at each step t according to the new objective function. The loss function in XGBoost can take various forms, such as mean squared error or logistic loss. Besides, XGBoost supports custom loss functions. The regularization term is a major contribution of XGBoost, which is given as: where γ stands for the complexity parameter of each leaf; T is the total number of leaves; λ is used to scale the penalty; w is the vector of scores on leaves.

6) CLASSIFICATION PERFORMANCE EVALUATION
Probably the most intuitionistic metric in evaluating classification model performance is the overall accuracy, which can be derived from the confusion matrix in Table 2. The overall accuracy is calculated as: In general, the overall accuracy can be used to evaluate how accurately the classification model can predict the testing data. However, when the dataset is imbalanced, relying merely on the overall accuracy might produce biased evaluation. For instance, when a model tries to classify a dataset with 95 negative instances and 5 positive instances, it can easily achieve a 95% accuracy by classifying all instances as negative. Apparently, this high accuracy is doubtable, and the corresponding classification model might fail to be informative. In this article, geometric mean (G-mean) is selected to evaluate the classification performance of the proposed models together with the overall accuracy. As a widely used metric in imbalanced learning field [3], [6], [41], G-mean aims to maximize the accuracy of each class while keeping these accuracies balanced. For a n-class classification problem, G-mean is calculated as: G − mean = n class 1 accuracy×class 2 accuracy×· · ·×class n accuracy (9) As shown in Equation (9), G-mean is not affected by the number of instances within each class. In addition, the area under the curve (AUC) is also calculated to further compare the classification performance of each modeling scenario.

C. MODEL RESULTS INTERPRETATION
The purpose of developing a crash severity analysis model is to uncover the relationship between various features and crash outcomes. Subsequently, corresponding administrative and engineering countermeasures could be implemented to alleviate the crash severity. As such, the interpretability of the model output is as important as its accuracy. The output of a linear model, such as MNL, is straightforward and easy to understand: the parameter value of each feature could be used to measure the impact of this feature on the model outcome. However, such models are only able to uncover linear relationships. On the other hand, more sophisticated machine learning models, such as XGBoost or random forest, are able to uncover more complicated relationships and predict crash severities with relatively high accuracy [10]. Nevertheless, these models could be difficult to interpret. A common approach to explain these models' results is to calculate the importance of features based on gain or split counts. But this approach could suffer from inconsistency, i.e., the order in which a feature is added to the model could significantly affect the importance of this feature [11], [42].
This study adopts a novel approach called SHAP (Shapley Additive Explanations) to explain the output of machine learning models. Originally proposed by Lundberg and Lee [11], SHAP is designed to explain the output of any machine learning model in a consistent and accurate way based on game theory and local explanation. Generally speaking, SHAP measures the importance of a feature by comparing model predictions with and without this particular VOLUME 8, 2020 feature. Unlike other feature attribution approaches, SHAP is able to compute the exact SHAP value of each feature for each individual instance. As an additive feature attribution method, SHAP develops a linear explanation model g for each instance within the dataset: where g is the explanation model used to explain the model prediction on an instance; M is the number of features in the model; φ i is the SHAP value for a feature i; z i = 1 if a feature i is present and z i = 0 otherwise. The SHAP value for a feature i is calculated by comparing the model predictions with and without this feature. Since the order in which features are added to the model could affect the model prediction, all possible orders are permuted, and the SHAP value is calculated as a weighted summation. This can be described in the following equation: (11) where S is the subset of features used in the model; M is the number of features; f x (S ∪ {i}) and f x (S) are the model predictions with and without feature i. In this way, the individual prediction in the model could be accurately explained. The second term in Equation (11) indicates that the φ i could be negative, meaning feature i could have a negative impact on the model output. For a classification problem, a SHAP value matrix with the same size of the input data could be obtained for each possible model output. The average of the absolute SHAP value of feature i is used to measure the impact of this feature on the model output. The current study uses Python library SHAP to calculate SHAP values [11].

V. RESULTS AND DISCUSSION
The dataset is randomly divided into training and testing subsets according to a 7:3 ratio. The passenger car driver models and truck driver models are separated developed. The classification performances are reported in Table 3 and Table 4, respectively. These tables include overall accuracy, per-class accuracy, G-mean and AUC for each classification model and each data imbalance treatment approach. As shown in Table 3, for the passenger car driver crash severity analysis, the highest G-mean is achieved when XGBoost is used with the cost-sensitive learning approach (G-mean = 58.23%). And the associated overall accuracy is 60.37%. The second-best result of G-mean is achieved when XGBoost is combined with the hybrid data preprocessing approach (G-mean = 56.82%). Nevertheless, the overall accuracy in this scenario is significantly lower (overall accuracy = 46.94%). Besides, XGBoost combined with costsensitive learning achieves the highest AUC (0.72) as well.
As for the truck driver crash severity analysis, results in Table 4 reveal that the highest G-mean is also achieved when XGBoost is combined with the cost-sensitive learning approach (G-mean = 55.55%). And the associated overall accuracy is 63.57%. Although the AUC of this modeling scenario is lower than that of XGBoost model trained on the imbalanced dataset, the per-class accuracy and G-mean are greatly improved. The second-best G-mean is achieved when the decision tree model is combined with the hybrid data preprocessing approach (G-mean = 51.84%). Then, SHAP is adopted to explain the results of the bestperforming modeling scenarios. Figure 2 and Figure 3 present the impact of factors on passenger car driver and truck driver crash severity, respectively. The factors are sorted in descending order based on the average of the absolute SHAP values. It's worth mentioning that units of SHAP values depend on the selected classification model. For XGBoost, SHAP values have log odds units.
As shown in Figure 2, the crash configuration is the strongest predictor for passenger car driver injury severity. Besides, the gender of the passenger car driver, the traffic control device, roadway configuration and the age of the passenger car driver also have significant impacts on crash outcomes. On the other hand, weather condition, road surface condition and the truck age have the least impact on passenger car driver crash outcomes.
Turning to factors affecting crash severity of truck drivers (Figure 3), crash configuration is also the strongest predictor, followed by the gender of the passenger car driver, roadway configuration, the age of the truck driver, and the traffic control device. Meanwhile, the passenger car age, the gender of the truck driver and road alignment have the least impact on truck driver crash outcomes.
Although feature importance figures are useful, they contain no information beyond average impacts of features on model output magnitude. For more informative explanations, the contribution of each feature on a specific crash outcome should be illustrated. Figure 4 and Figure 5 presents the impact of features on passenger car driver and truck driver fatal crashes, respectively. In SHAP, these figures are called summary plots, which combine feature importance with feature effects. The features (y-axis) are sorted in descending order according to their global impact on the model output (in this case, fatal crashes). Each point on the summary plot represents a SHAP value (x-axis) for a feature in a crash. The color represents the value of a feature. Overlapping points are plied up to show density. Again, for XGBoost, SHAP values on the x-axis have units of log odds.
As shown in Figure 4, the crash configuration is the single most important predictor for passenger car driver fatalities. In crashes involving vehicles travelling in the same direction, passenger car drivers are less likely to suffer from fatalities. Other crash configuration (such as head-on crashes, left/right turn crashes, and different direction sideswipe crashes) would increase the fatality risk of the passenger car driver. The traffic control device is the second most important predictor. Crashes occurred in places with some kind of traffic control devices (such as traffic signals, stop signs, and warning signs) are less likely to result in passenger car driver fatalities. Properly installing traffic control devices could be an economical and effective method to reduce fatality risks  in passenger car-truck crashes. In general, older passenger car drivers are more likely to suffer from fatalities. Besides, compared to female drivers, male passenger car drivers are more prone to fatal crashes. As such, traffic safety education campaign targeted at older male passenger car drivers should be promoted. It's worth noting that passenger car safety device usage is not the most important feature, but driving without safety belts could significantly increase the fatality risk, given the long tail in the plot. Although the gender of the truck driver is one of the least influential factors, it seems that the presence of female truck drivers could reduce the fatality risk of the passenger car driver.
Compared with Figure 4, most features in Figure 5 have similar effects on truck driver fatalities. For instance, truck drivers are also more likely to suffer from fatalities in crashes involving vehicles travelling in different directions. Crashes occurred at intersections are less likely to result in truck driver fatalities. Nevertheless, six features exhibit opposite effects on passenger car driver and truck driver fatalities, including the age of the passenger car driver, crash hour, the passenger car age, road surface condition, weather condition and the truck age. Compared to younger passenger car drivers, older drivers are more prone to fatalities. However, the presence of older passenger car drivers would reduce the fatality risk of truck drivers. This result further emphasizes the vulnerability of older passenger car drivers. Regarding the crash hour, crashes occurred earlier in the day (0:00∼6:59) would increase the fatality risk of passenger car drivers but decrease the fatality risk of truck drivers. This is probably due to passenger car drivers are more likely to be affected by fatigue compared to truck drivers on night shift, who have more experience in this condition. Newer passenger cars are less prone to passenger car driver fatalities, but more prone to truck driver fatalities. This may be due to a newer car has more safety features and better mechanical condition. When road surface is covered with ice, truck drivers are more likely to suffer from fatalities. With higher center of gravity, trucks are more prone to rollover on icy road, leading to fatal injuries. Likewise, icy road surface would increase the fatality risk of passenger car drivers in some cases. But in other cases, it might decrease the fatality risk. This is probably due to some passenger car drivers are more cautious in this situation. As such, they tend to reduce speed to avoid severe crashes. Regarding weather condition, snow would always decrease the fatality risk of truck drivers. But it might increase the fatality risk of passenger car drivers in some cases. Compared to older trucks, newer trucks (within 5 years) are less prone to truck driver fatalities as the small values of V_YEAR_t (the truck age) are associated with negative SHAP values in Figure 5. Nevertheless, Figure 4 suggests that newer trucks would slightly increase the passenger car drivers' fatality risks.

VI. CONCLUSION
This study aims to explore factors affecting the passenger car driver and truck driver crash severity. For the modeling purpose, crashes involving one passenger car and one truck from 2007 to 2017 in Canada are collected and processed. To compare the classification performance of different classifiers, this study uses five different classification models: multinomial logistic regression, Naive Bayes, Classification and Regression Tree (CART), support vector machine, and eXtreme Gradient Boosting (XGBoost). In view of the imbalanced crash severity distribution, four data imbalance treatment approaches are separately applied, including over-sampling, under-sampling, a hybrid method combining under-sampling and over-sampling, and a cost-sensitive learning method. Each classification model is combined with one data imbalance treatment approach to generate the classification result. In light of the data imbalance issue, this study selects geometric mean (G-mean), overall accuracy, and AUC to evaluate the classification performance. To improve the interpretability of classification model results, a recently proposed approach called SHAP (Shapley Additive Explanations) is applied.
For the passenger car driver crash severity analysis, XGBoost combined with cost-sensitive learning generates the best result (G-mean = 58.23%, overall accuracy = 60.37%, AUC = 0.72). Likewise, regarding the truck driver crash severity analysis, the best result is also achieved by combining XGBoost with cost-sensitive learning (G-mean = 55.55%, overall accuracy = 63.57%, AUC = 0.70). G-mean in the current study outperforms the best results in previous literature regarding the 3-class classification [6], [7].
To make the XGBoost model results more informative, this study adopts a recently proposed approach called SHAP. Based on game theory and local explanation, SHAP is designed to explain the result of any machine learning model in a consistent and accurate way. Impacts of features on passenger car driver and truck driver injury severity are separately reported. Additionally, to explore factors affecting driver fatalities, impacts of features on passenger car and truck driver fatal crashes are presented. Among all these features, most of them have similar effects on passenger car driver and truck driver fatalities. However, six features exhibit opposite effects, including the age of the passenger car driver, crash hour, the passenger car age, road surface condition, weather condition and the truck age. Results of the current study could provide some valuable insights to alleviate severity of passenger car-truck crashes. For instance, both passenger car and truck drivers should be aware that wearing safety belts could significantly reduce their fatality risks in crashes. Besides, properly installing traffic control devices (such as traffic signals, stop sign or warning sign) could be an economical and effective method to reduce fatality risks in passenger car-truck crashes. Passenger car drivers should be extremely cautious when driving between midnight to 6 am, especially on roads with heavy truck traffic. Besides, traffic safety education campaign targeted at elderly male passenger car drivers should be promoted. They should be more careful when driving on truck corridors. Traffic administration department should set up more safety warning signs on curved roads and mountain roads. For a snowy country like Canada, timely clearing of snow and icing on the road is essential to improve traffic safety, especially on truck corridors.
This study demonstrates that XGBoost combined with cost-sensitive learning is a decent method to identify factors affecting injury severity when data is highly imbalanced. Besides, SHAP could be a valuable tool for interpreting results of crash severity analysis models.

VII. STUDY LIMITATIONS
Meanwhile, the study limitations should be noted. Firstly, variables used in this study could be expanded to increase the reliability of model results. Examples include traffic flow information prior to the crash, speeding, driver fatigue, and working experience of the truck driver, etc. Secondly, although this study applies four different approaches to handle the data imbalance issue, other methods in imbalanced learning field are also worth exploring. For instance, Generative Adversarial Networks (GAN) seem promising in improving the performance of imbalanced learning models. Future studies should test the potential of GAN in traffic safety analysis. Thirdly, this study uses only police-reported crash data for modeling. Future studies should consider building models with data generated from traffic simulators. Then, the performance of different models could be compared. Lastly, the quality of model results is only as good as the input data. All crashes are recorded and coded by police officers based on the best available information. As such, some errors in the data are inevitable, which might affect the quality of model results.