A Spatial Analysis Methodology Based on Lazy Ensembled Adaptive Associative Classifier and GIS For Examining the Influential Factors on Traffic Fatalities

Lazy


I. INTRODUCTION
Road traffic accidents (RTAs) have become a global public health and development problem, killing nearly 1.3 million people and disabling 20-50 million people annually and costing most countries 3% of their gross domestic product [1].RTAs have been reported as ''the eighth leading cause of death globally''.However, interventions implemented by countries in past years have proved that most traffic crashes are both predictable and preventable [1].To support the prediction and prevention of RTAs, scholars and governors should have a proper understanding of the influential factors of traffic accidents.Identifying the factors can help better The associate editor coordinating the review of this manuscript and approving it for publication was Vlad Diaconita .
understand the cause-effect behind, and thus help design relevant interventions.Some studies have been conducted to investigate various kinds of influential factors, such as speed, alcohol, helmets, seat-belts, and road infrastructure [2]- [5].To better model the relationships and evaluate the factor importance, scholars have proposed different kinds of methods to analyze these impact factors.Commonly used methods can be classified into the following groups.
The first group of methods bases on multivariate regression.For example, Zhong-Xiang et al. [6] combined Verhulst and multivariate linear regression models to analyze the fatalities of road traffic accidents in China from 2002 to 2011; Girotto et al. [2] investigated the relationship between professional experience and traffic accidents or near-miss accidents among truck drivers using multinomial logistic regression.Anastasopoulos et al. [3] used the multivariate Tobit regression model to analyze the highway accidentinjury-severity rates.However, the linear assumption behind these methods (e.g., Y = Xβ + ) goes against the nonlinearity of the influential factors in real-world problems and affects the model performance [7], [8].
The second group of methods relies more on latent class analysis.For example, Depaire et al. [9] applied latent class clustering to identify homogenous traffic accident types.Adanu et al. [10] investigated the factors that influence the severity of single-vehicle crashes that happen on weekdays and weekends with the latent class logit model.Latent class analysis can capture unobserved heterogeneity by allowing parameters to differ across observations, but it does not account for the possibility of variation within a class as it assumes homogeneous characteristics of the within-class observations [11], [12].
Some researchers applied non-parameter methods, such as the Bayesian network, to reveal the connection between traffic accidents and their influential factors.For example, de Oña et al. [13] used Bayesian Networks as well as the latent class clustering method to study 3229 accidents on rural highways in Granada between 2005 and 2008.Theofilatos [4] deployed Bayesian and finite mixture logit models to investigate the accident likelihood and severity on urban arterials, finding that traffic variations had a significant effect on accident occurrence but mixed effects on accident severity.Elvik et al. [14] developed a before-after evaluation of road safety to study the impact of a new motorway in Norway with Empirical Bayes.The problem of the Bayesian network is that it is computationally expensive and not effective on highdimensional datasets [15].
To avoid the shortcomings of the previous methods, scholars have developed artificial intelligence (AI) related algorithms recently.In data analysis, AI-related methods can be divided into machine learning methods and deep learning methods [16], [17].Deep learning methods have caught lots of attention these years in traffic analysis due to its superb nonlinear modeling ability [18], [19].However, its black box nature restricts it from analyzing impact factors [20].Therefore, the current research directions on this problem shift more towards machine learning (ML) methods.Among all the reported ML methods, one typical example is the association rule analysis.It can not only study the causeeffect between one item factor and the target but also investigate the relationships between multiple item factors and the target.For example, Montella et al. [5] applied association rule to reveal the characteristics of powered two-wheeler crashes.Xi et al. [21] analyzed the level of influence of causational factors for traffic accidents by association rules.Weng et al. [22] investigated the crash casualty patterns of the work zones using association rules.However, the identification of thresholds of association rule analysis remains to be a problem.Previous literature usually relied on the experience of researchers to identify the thresholds [23]- [26], which is not a method that can be generalized.Therefore, an association rule mining-based framework with the ability to identify the thresholds is worthy of attention.
Another limitation of the existing studies on the influential factors is that most of them only investigated the overall weights or relationships between the factors and the traffic fatalities.Few of them went further and analyzed the spatial relationships.For example, an impact factor that has been recognized by many studies is driving with alcohol (or drunk driving), but previous studies failed to answer the question that which place in the city should enhance the alcohol control.This answer to the question is the result of what we defined the spatial analysis of the influential factors in this study.
This study proposes a methodology framework based on association rule analysis and road-based GIS analysis to investigate the influential factors that cause traffic fatalities.The methodology integrates Lazy ensembled adaptive Associative Classifier (LeaCA) to optimize the thresholds and uses geographical information system (GIS) to interpret the cause effects spatially.Traffic accident data of Los Angeles are used to test the framework.By providing evidence-based information, our results can help governments identify the major causes of traffic fatalities in Los Angeles and formulate specific policies and legislation.
The rest of this paper is organized as follows.In Section 2, we describe the methodology framework.In Section 3, a case study in Los Angeles is presented.Discussions of results are given in Section 4. Conclusions and limitations of this paper are provided in Section 5.

II. METHODOLOGY FRAMEWORK
The proposed methodology framework is shown in FIGURE 1.It consists of three parts.The first part is data preprocessing.The second part is the model implementation.Association rule analysis was conducted to investigate the relationships between impact factors and traffic fatalities.The lazy associative classifier was proposed to optimize the threshold for support and confidence in association rule mining.The third part is post engineering, including rule mining and road-based GIS analysis.

A. PREPROCESSING
The first part is data preprocessing.The collected raw data usually has some problems, such as missing values, noisy data, and data imbalance.These problems should be addressed before we use the data in the model.The procedures to tackle these problems are typical in machine learning but may vary a bit from problem to problem.More details will be introduced in the case study.
Besides, the features need to be binarized since the association rule mining can only analyze binary data.In this study, there involved four kinds of features, including binary features, categorical features, numerical features, and string features.Binary features do not need any formatting since it can directly fit association rule analysis.Categorical features are transformed into binary features using the one-hot   encoding [27].This technique will generate new binary features to represent each option in the categorical feature.FIGURE 2 presents an example of one-hot encoding.
The formatting of numerical features and string features is a bit complicated.The idea is transforming these features into categorical features first and then modify them into binary features using one-hot encoding.Numerical features may use binning methods to achieve so, while string features are more complicated and the procedures may require much domain knowledge and can vary from problems to problems.More details will be introduced in the case study.

B. MODEL IMPLEMENTATION 1) ASSOCIATION RULES ANALYSIS
Association rules analysis is a rule-based machine learning method for determining the connections between different fields of data.Due to its excellent performance in identifying strong rules in databases, it has been employed in many application areas, such as market basket analysis, web usage mining, and bioinformatics [28].Association rules mining was firstly introduced by Agrawal et al. [29] and can be defined as follows.
Let I = {i 1 , i 2 , . . ., i m } be a set of m binary features.Let D = {s 1 , s 2 , . . ., s n } be a set of accidents that form the database.Each accident in D has a unique ID and contains a subset of features in I .A rule is defined as an implication of the form X ⇒ Y where X, Y ⊆ I and X ∩ Y = ∅.The sets of features X and Y are called antecedent (left-handside or LHS) and consequent (right-hand-side or RHS) of the rule.In this study, Y has only one indicator of whether the accident is fatal or the victim is dead, while X is the combination of different accident situations and attributes.In this way, the identification of the strong rules in X ⇒ Y can help identify the influential factors.
Theoretically, numerous rules can be generated.However, not all rules can provide useful information.Rules which surpass user-specified minimum support and minimum confidence threshold are defined as interesting rules that may reveal valuable knowledge [30].Support (S) and confidence (C) are two critical criteria in association rules mining.Support determines how often an item appears in the given dataset, and confidence indicates how frequently items in Y appear in transactions that contain X.Their mathematical format can be expressed as Eq. ( 1) and ( 2).

Support, S (X
where σ is summation notation.The identification of the rules and the calculation of its support and confidence can be time-consuming because when the number of features m gets larger, the combination of the features in X can be massive.It is not smart to conduct a brute force procedure to accomplish this.Therefore, scholars proposed the Apriori algorithm to tackle this problem.The algorithm uses a breadth-first search strategy to count the support of feature sets and uses a candidate generation function that exploits the downward closure property of support.The pseudo-code of Apriori is as shown in ALGORITHM 1.However, in many cases, the efficiency of Apriori is still not satisfactory, especially for long patterns [31].In this study, since the length of itemsets is less than four in later experiments and the consequent item is fixed as the fatality of the traffic accident, then the rule extraction process is not very complex, and there is no much difference between these choices.Also, the main focus of the methodology is to propose an association rule-based classification model, so we will just go for the most typical and publicly accepted algorithm, Apriori, for rule extraction.

Algorithm 1 The Apriori Algorithm for Generating Candidates of Strong Rules
L k : frequent feature sets of size k 1. for (k =1; for each transaction t in database 4.
increment the count of all candidates in C k+1 that are contained in t 5. end 6.
L k+1 = candidates in C k+1 with min_support 7. end 8. return ∪ k L k ; It can be inferred from ALGORITHM 1 that the identification of the minimum threshold of support and confidence is one of the prerequisites in association rules analysis.Previous literature usually relied on the experience of the scholars to determine the threshold [30], which, however, is not a method that can be generalized.In this paper, this problem is addressed by integrating a classification algorithm, namely Lazy ensembled adaptive Associative Classifier (LeaAC).

2) LAZY ENSEMBLED ADAPTIVE ASSOCIATIVE CLASSIFIER (LeaAC)
The idea is to build a classification model that has the same target as the association rule mining.For example, this study intends to mine the association rule that will lead to fatal accidents.Then the proposed method will build a classification model based on the boolean features to classify whether an accident is fatal or not.The classification algorithm we developed here is LeaAC, and in this algorithm, Support and Confidence are two parameters.Therefore, the set of parameters that help LeaAC to achieve the highest classification accuracy becomes the optimal value for Support and Confidence.The equation format of this optimization idea is shown in (3), (4), and (5).
where T is the target, and is the fatality of the accidents in this study.T is the predictions of the classification model, x m are the values of the binary features.
Traditional associative classifier mines all frequent class association rules (CARs) as essential decision-rules [32].It checks whether each CAR matches the test instance during the testing phase and chooses the first CAR matching the test instance to predict the class.However, it may generate a large number of rules, many of which may be useless, and in some cases, important rules may never be mined [33].
Lazy associative classifier (LAC) overcomes this problem by focusing the rule mining in the given test instance.Instead of creating the classification model during the learning phase using training data, LAC postpones generalization and builds the classification model until a query is given.Although the testing stage can, therefore, be slower, the accuracy can be improved significantly.Also, this study upgrades the labeling process of LCA by introducing adaptive weights for the rules used for classification.The weights are calculated using the information gain in each rule, and the eventual output incorporates the idea from ensemble learning to gather the prediction results of all the rules.We name this algorithm as Lazy Ensembled Adaptive Associative Classifier (LeaAC).The pseudo-code of LeaAC is as shown in ALGORITHM 2. let D i be the projection of D on features only from t i 3.
let L i be the set of all rules {X → y} mined from D i passing min support and min confidence 4.
Calculate the information gain vector G i of all the rules in L i 5.
Ensemble the results G i • ŷi and predict class y i (positive/negative separation) 6.
Insert y i to Y 7. Return Y

C. RULE MINING AND GIS ANALYSIS
After the optimized support and confidence value are obtained, association rules can be extracted using the Apriori algorithm.The rules will then be examined through an analytic hierarchy process (AHP) to determine the real influential factors.AHP is one of the techniques of Multi-Criteria Decision Making (MCDM) to weight and compare a set of elements and then select the best one.Different decisionmakers first give out their opinions on the factor weights and factor values, and AHP will integrate their opinions using weighted regression.The top rank factors in the AHP process then become the most appropriate rules.Note that the AHP method relies on the knowledge from domain experts, and their opinions may be subjective to some extent.However, VOLUME 8, 2020 the problem we target in this study in a complicated realworld city governing problem.The procedure cannot merely be a numerical analysis and avoid opinions from domain experts, and AHP is a scientific tool to collect and integrate the knowledge from experts, while the association rule analysis provides essential preliminary results.
Then, GIS is used to study the spatial relationships between the impact factors and traffic fatality.Traditionally, when plotting the distribution of the factors, scholars may directly use the density plot [34], [35].However, the density plot of the accidents associated with different factors generally follows the same distribution of the accident density, which makes the density plot less sensitive when studying the spatial relationships between impact factors and traffic fatalities.
In addition, instead of the number of fatal accidents, city managers might be more interested in the fatality rates of accidents.Places with more fatal accidents may simply because they have higher traffic volumes.However, places with higher fatality rates indicate that the place is dangerous, and should be given more attention.
To achieve the spatial analysis of fatality rates, we proposed a road-based analysis in GIS, because traffic accidents are all happened on or near the roads.This study collected all the road data from Los Angeles County GIS Data Portal, and these roads are that used by the US Census Bureau to help locate citizens during its decennial census.The proposed spatial analysis is conducted as follows.
• Map the accident data into the road maps.Since the roads are line features, the accident points cannot directly be joined into the roads.We created polygon buffer zones along both sides of the roads 10 meters, and then map those accidents points into the roads.
• Transfer road features to point features.Although the accidents have been grouped into the roads, the road features are not friendly in visualization.Some roads are long while some are short, so the plotted network can be visually messy and not friendly to analyze.To tackle these, we used points to represent the roads.Each road is transformed into a point, which is located at the central position of the road.We then plot the relevant rates and relationships using those points in GIS.
• Value calculation.After mapping the accident points, we were able to calculate the fatality rates in each road and the relevant accident features, which allows the spatial analysis of these influential factors.More details will be introduced in the case study.

III. AN EXPERIMENTAL CASE IN LOS ANGELES CITY A. DATA COLLECTION
To validate the proposed methodology, we conducted a case study in Los Angeles city.We choose this city because it is reported to have the highest rate of injury-causing and fatal traffic accidents in the nation [36].The data was extracted from the open dataset of the Transportation Department of California and the American Highway Control Center (https://dot.ca.gov/).This study focuses on the fatality of the traffic accidents, while the fatality level of an accident is decided by whether there is any victim been killed.Therefore, this study uses the fatality of the victim as the research target.This target can provide more insights from the perspective of the victim to understand fatal accidents.Alongside the tenyear accidents (2003-2012) from the raw data, we obtained 526,123 victims and the information of the related accidents.
Based on the fatality of the victims, we obtained 43,668 positive cases (fatal), and 482,455 negative cases (non-fatal).TABLE 1 presents the detailed features of the dataset.73 features are divided into 5 groups, representing features about the collisions (28 features), features regarding the victims (8 features), features of the parties that involved in the accidents (11 features), features concerning the time and location of the accidents (19 features) as well as the features related to the environment (7 features).The second column shows the feature abbreviation and description.The third column shows the data type, and the fourth column presents more details of the features.If the feature is categorical, then the number of categories is shown.If the feature is numeric, then the standard deviation is provided [37].

B. DATA PREPROCESSING 1) DATA FORMATTING
The raw data cannot be directly inputted into the methodology framework due to some flaws.Several preprocessing procedures need to be conducted.The first is data formatting.There are 18 numerical features in the dataset, and these features cannot be directly used in association rules analysis.They need to be converted to binary categorical data.This study implemented an equal bin method.This method first ranks the numerical value from the smallest to the largest and then divides the cases into k different groups with the same frequency.The samples in each group share the same categorical value.FIGURE 3 presents an example of formatting a numerical feature into a categorical feature.By using this method, this study transformed the 18 numerical features into categorical features, and k is set as 5 (k = 5 provides the highest accuracy in later experiments).
Note that the only string feature in this study refers to the name of the roads, which is useless in this study, so it was excluded from the experiment.After obtaining the categorical FIGURE 3.An example of formatting a numerical feature into a categorical feature using the equal bin method (k = 3).features, we transformed them into binary features using the one-hot encoding methods introduced in the methodology section.After these steps, the data dimension of this experiment is expanded to 682.

2) DATA CLEANING
Besides formatting the data into a model friendly manner, the noisy data need to be excluded.Data cleaning can help reduce the calculation complexity and better interpret the relationships [38], [39].In this study, we removed two kinds of noisy data, including redundant features and high correlational features.Redundant features describe useless information for mining the influential features on the victim fatalities, so they are excluded from the experiment [40].For example, features such as ''POING_X'', ''POINT_Y'', and ''LAPDDIV'' describe the spatial coordinates and the jurisdictional information, which are not the causes of traffic fatalities, and therefore, they are deleted.
High correlational features mean some features are too similar to each other, and the existence of these features provides limited additional information for data mining but increases the complexity, and therefore, they should be excluded as well.Since the features in this study have already been transformed into binary features, the Pearson correlation is not available for measuring the correlation.Therefore, Spearman correlation is used in this experiment.The difference between these two measures is that Pearson uses numerical values, while Spearman uses rank values.For a binary feature, positive values rank the first, while negative values rank the second.This study excluded one feature in each pair that has an absolute correlation higher than 0.9.The remained one has a higher correlation with the target (victim fatality), while the deleted one has a lower value.After these two steps of data cleaning, the data dimension drops to 399.

3) NEGATIVE ASSOCIATION RULES
Traditional association rule analysis can only discover positive rules because when calculating support and confidence, it will neglect the negative class.However, the negative classes can sometimes provide valuable insights [41], [42].For example, according to the results in this study, whether the victim has insurance can influence the fatality rate a lot.However, it is not the rule ''the victim has insurance lead to a fatal accident'' is a strong rule, but the reverse, ''the victim has no insurance lead to a fatal accident'' a strong one.
To identify these negative but strong rules, we generated a set of negative features by reversing the positive and negative classes in each feature.The newly created features have a −1 correlation with the original features.After this step, the feature dimension increases to 798.

4) DATA BALANCING AND CROSS-VALIDATION
Another problem that exists in this study is the imbalance.As introduced in data collection, this experiment has 43,668 positive cases and 482,455 negative cases.The dataset is very imbalanced.The rate between positive cases and negative cases is around 1:11.
Traditionally, scholars would use either under-sampling or oversampling to address this problem.However, with such a large imbalance rate, oversampling can easily cause overfitting [43], while under-sampling can miss a large proportion of the data.Therefore, this study proposes a combined strategy to address this issue.This strategy will divide the negative cases into 11 segments without replacement, and then conduct the modeling procedure 11 times.Each time the positive cases will be combined into one segment of the negative cases to form a dataset for modeling and calculation.The averaged results of these 11 runs give the eventual results.
Also note that thanks to the 11-run strategy, there is no need for cross-validation in this study.The averaged performance of the traditional 3/7 testing/training partition of these 11 runs can already provide stable and reliable results for both classifications using LeaCA and association rule mining.Note that the random seed in each run is different, so the positive cases in these 11 runs are also different.

A. IDENTIFICATION OF THE OPTIMAL THRESHOLDS FOR SUPPORT AND CONFIDENCE
This experiment targets at studying the influential factors on the victim fatalities using association rule analysis.Support and Confidence are two criteria to filter out numerically strong rules.One problem in the existing literature is that they cannot identify a set of proper thresholds for these two criteria.This study proposes the implementation of LeaCA models to address this gap.The idea is using the influential features as the variables and the victim fatalities as the target to build binary classification models with the data balanced.Support and Confidence are two critical parameters in this model, so the model that provides the best classification performance defines the optimal thresholds for Support and Confidence.
After preprocessing, the dataset can be fed into the classification model built by the LeaCA algorithm.Besides Support and Confidence, there is another parameter that affects the performance of LeaCA a lot.That is the number of maximum items in a rule and is marked as I max .Therefore, in order to identify the best set of Support and Confidence, this study optimizes these three parameters together.
This experiment explored the model performance when I max = {2, 3, 4}.Note that when I max = 2, the rules generated by the Apriori algorithm only has two items, which consist of one antecedent and one consequent (such as A ⇒ B), while when I max = 3, three-item rules such as A, C} ⇒ B, can be generated.
After some tests, we found when I max = 4, the training time of the model is too long to be acceptable, and the accuracy drops significantly, so we explored when I max = {2, 3}.FIGURE 4   It is discovered that when I max = 2, the highest modeling accuracy is 82.31%, and when I max = 3, the highest modeling accuracy becomes 77.78%.Therefore, I max is set as 2, and the best performance model is given by the parameter set with Support=0.04and Confidence =0.74.As a result, this set of parameters is set as the threshold for association rule mining in this study.We think the reason why I max = 2 outperforms I max = {3, 4} is because longer rules contained more constrains and has higher risks of overfitting.
Besides, to further verify the effectiveness of the proposed methodology, we added a comparison with three other commonly used methods on modeling the feature weights.They are multiple linear regression (MLR), logistic regression (LR), Naive Bayes.MLR and Naïve Bayes are the most typical algorithms mentioned and the first and third group of methods in the introduction, while logistic regression is the most commonly used nonlinear regression methods in the industry.For the latent class analyses mentioned in the introduction, since they are unsupervised learning methods and do not support regression, we did not pick them for comparison here.TABLE 2 presents the results of the comparison.The proposed LeaAC method has the highest modeling accuracy.This performance, on another angle, supported the priority of the proposed method.

B. STRONG RULES ON VICTIM FATALITY
After obtaining the optimal Support and Confidence, we applied the thresholds on the 11 datasets generated in  the balancing step.To reduce the impact of data variance, the rules that survive in all these 11 runs are extracted as the strong rules.This resulted in 69 strong rules in this study.
However, not all the 69 rules are of practical use, and some of them become ''strong'' may because of data variance.Therefore, this study conducted the Analytic Hierarchy Process (AHP) method to identify practical strong rules further.AHP is a decision-making process that will collect opinions from domain experts first to decide the weights of the decision-making factors and then decided the score of a candidate at each factor [44], [45].For example, we defined three criteria to identify strong rules.Besides Support and Confidence, we added ''practicality'' that also ranges from [0,1] to measure the practicality of a rule.This ''practicality'' is to collected the domain experts' opinions on the practicality of a rule in the questionnaire.An expert should first mark the practicality score of a rule (P in Equation IV-C) and then provide his opinion on the decision-making weights of these three criteria (W S , W C , W P , in Equation IV-C).Support and Confidence already have calculated values, so the expert does not need to score for them.The following equation then gives the final score of a rule.
where S and C are Support and Confidence value, P is the practicality score, and W is the weights provided by the experts.
The questionnaire is sent out to fifty scholars that have related publications on machine learning or statistics in accident research.Sixteen of them replied, and eleven of them completed the questionnaire.We gathered their opinions, averaged their weights in Equation IV-C, and calculated the final score of all the 69 rules.

C. DISCUSSIONS AND SPATIAL RELATIONSHIPS
It can be seen from TABLE 3 that some of the rules have similar meanings.For example, VIOLCAT_11 and are both talking about pedestrians.notIN-SURED_Y and INSURED_N both mean in the accidents, proof of insurance is not obtained or the insurance is not applicable.So, in this discussion section, we combined the discussion of the rules with similar meanings and got six major influential factors, including VSAFETY2_W, VEJECTED_1, VTYPE_3, PSOBER_B, notINSURED_Y, and notVSAFETY2_G.
These features have been partly discussed by previous literature [4], [10], [22], [46].All of them exhibit explicit threats that lead to fatal accidents.For example, VSAFETY2_W and VTYPE_3 are talking about two typical Vulnerable Road Users (VRUs), which are motorcycle drivers and pedestrians.Studies have shown that these VRUs have five times higher fatality rates than typical car-car accidents [46] because they have no protection equipment such as seat belts or airbags.
VEJECTED_1 refers to the victims that are fully ejected from their seats.This relationship may have two situations.First is that the victim may not have fastened the seatbelt during a severe accident, which is also part of the situation described by the feature notVSAFETY2_G.This apparently can lead to higher fatality rates.The second situation may go back to motorcycle accidents.The drivers or the victims on a motorcycle does not have seatbelts and can be easily ejected from their seats.
PSOBER_B refers to the accidents involved with alcohol, which has been a well-known killer in traffic accidents.Although governments have tried different policies and strategies, drunk driving is still causing many traffic fatalities.This experiment will later point out where should the government focuses more when controlling drunk driving.
notINSURED_Y is describing the group of victims and parties that do not have insurance.The cause-effect behind this feature may result from two aspects.The first may refer to those from low-income families or under the poverty level.They are not willing to buy insurance due to economic issues, and therefore cannot receive proper treatments after a car accident.The fatality rate is then increased.The second aspect may result from those who are not fully aware of the importance of insurance and do not want to waste money on that.This group of people may have limited education, and they may also have a weak sense of traffic rules or proper driving behaviors.These factors potentially lead to higher fatality rates.
To spatially analyze the relationship between the six major influential factors and the fatality rate, the road-based GIS analysis introduced in the methodology section is utilized.FIGURE 6 presents the distribution of traffic accidents and the victim fatality rate.The red line represents the road network, and the yellow cycle represents the density of traffic accidents or the victim fatality rate.Larger cycle means denser accidents or higher rates.It can be observed from the accident distributions that traffic accidents mainly scatter in areas A, B, and C.This might be caused by the dense population and the large traffic volume there.The distribution of the victim fatality rate shows that areas D to H are more dangerous because they have higher fatality rates than other areas.As a result, instead of discussing the phenomenon behind the high density in areas A to C, this study is more interested in analyzing the influential factor behind the area with high fatality rates (D to H).To achieve this, we analyzed and plotted the percentage distributions of the accidents related to the six features in FIGURE 7 [47].
FIGURE 7 (1) indicates that the percentage of motorcycle accidents are quite high in area F and G. Therefore, to better control and reduce the fatality rate in these two areas, the government is suggested to put more constraints on the motorcycle driving there, such as speed control, forbidding motorcycle in bad weather.FIGURE 7 (2) shows that the percentage of ejected-from-seat accidents are higher in area F and G.The distributions in FIGURE 7 (2) are quite close to FIGURE 7 (1).We might guess that most of the ejectedfrom-seat accidents relate to motorcycle accidents.FIGURE 7 (3) reflects that the high fatality rates in areas G and H may be caused by the high rates of pedestrian accidents there.Therefore, the government should consider enhancing pedestrian safety in these areas.For example, design more pedestrian overpasses and underpasses, build more pedestrian guardrails.
FIGURE 7 (4) is the distribution of the percentage of accidents involved with alcohol.It seems that most of the dangerous places involve a high percentage of alcohol-impaired driving, such as area D, E, G, and H.Although it has been a tough task to control drunk driving all over the country for many years, the LA government should know that D, E, G, and H, these four areas should be their focuses.
The percentage distribution of the victims that do not have insurances is shown in FIGURE 7 (5).According to the analysis in previous contents, we suggest the government may enhance the management of compulsory insurance in areas F, G, and H. Also, proper financial support on the insurance in those areas can be considered.
The last figure in FIGURE 7 refers to the percentage of victims not using seatbelts.Therefore, the government may consider increasing the penalty for not wearing seatbelts in area E, G, and H, or investing in AI-empowered video surveillance on seatbelts to strengthen the management.
To sum up, this section discussed the identified influential factors for traffic fatality in LA.Through the road-based spatial analysis in GIS, we provided several suggestions to the government on improving traffic safety.Note that these suggestions are only the results from numerical studies.The real cause effects of the relationships and the effectiveness of these suggestions require further research to verify.

V. CONCLUSION
This paper studied the relationships between fatal traffic accidents and their influential factors in Los Angeles during ten years, using association rule analysis and Geographical Information System (GIS).The problem of determining the minimum thresholds of support and confidence in association rules mining is addressed by applying Lazy Ensembled Adaptive Associative Classifier (LeaAC).Spatial analysis of the relationship between the influential factors and the locations is conducted with the help of GIS.The contributions of this study are as follows: • The proposed methodology can not only numerically identify the most critical rules on traffic fatality, but also spatially analyze the relationships between the features and fatality rates.This method is expected to be applicable in other cities or regions, as well.
• The LeaAC model addressed the threshold problem in association rule mining, which is viewed as an advanced machine learning method for analyzing influential factors.
• The case study in LA uncovered six important influential factors on traffic fatality.The road-based analysis in GIS provided several actionable suggestions to the government.On the other hand, this study has limitations.Due to data availability, we only tested the traffic accidents in Los Angeles and did not examine the method performance in other cities and countries.Also, the data used in this study is from 2003 to 2012, which did not reveal the situations in recent years.Future studies can be extended to address these gaps and validate the proposed method in other accident datasets.

FIGURE 2 .
FIGURE 2. An example of one-hot encoding.

Algorithm 2
Lazy Ensembled Adaptive Associative Classifier (LeaAC) D: the set of all n training instances T: the set of all m test instances y: the target class (traffic fatality in this study, 1 means fatal, while −1 means non-fatal) 1. for each t i T do 2.

FIGURE 6 .
FIGURE 6. (1) The density distribution of the traffic accidents; (2) the distribution of the fatality rates in LA.
and FIGURE5present the optimization procedures of Support and Confidence with different I max .

TABLE 2 .
Classification accuracy of the four algorithms.The results are presented using the mean ± standard deviation format of the 11 runs.
TABLE 3 lists the top 10 rules with the highest score.These are recognized as the strong practical rules in this study.