Network-level Safety Metrics for Overall Traffic Safety Assessment: A Case Study

Driving safety analysis has recently experienced unprecedented improvements thanks to technological advances in precise positioning sensors, artificial intelligence (AI)-based safety features, autonomous driving systems, connected vehicles, high-throughput computing, and edge computing servers. Particularly, deep learning (DL) methods empowered volume video processing to extract safety-related features from massive videos captured by roadside units (RSU). Safety metrics are commonly used measures to investigate crashes and near-conflict events. However, these metrics provide limited insight into the overall network-level traffic management. On the other hand, some safety assessment efforts are devoted to processing crash reports and identifying spatial and temporal patterns of crashes that correlate with road geometry, traffic volume, and weather conditions. This approach relies merely on crash reports and ignores the rich information of traffic videos that can help identify the role of safety violations in crashes. To bridge these two perspectives, we define a new set of network-level safety metrics (NSM) to assess the overall safety profile of traffic flow by processing imagery taken by RSU cameras. Our analysis suggests that NSMs show significant statistical associations with crash rates. This approach is different than simply generalizing the results of individual crash analyses, since all vehicles contribute to calculating NSMs, not only the ones involved in crash incidents. This perspective considers the traffic flow as a complex dynamic system where actions of some nodes can propagate through the network and influence the crash risk for other nodes. We also provide a comprehensive review of surrogate safety metrics (SSM) in the Appendix A.


Introduction
Vehicular technology has witnessed key milestones in recent years.Most cars are heavily equipped with advanced visual and radio sensors, cameras, control units, and artificialintelligence (AI)-platforms that make driving safer and more convenient than ever.Electric vehicles (EVs) equipped with automated driving systems (ADS) have achieved higher levels of autonomy and continue to expand their territory in the global car market [1].Crowd-souring and connected vehicle (CV) services have been utilized to improve the overall operation of the vehicular networks through data and model sharing.For instance, Uber Advanced Technologies Group has recently proposed a unified deep learning (DL) framework that assists automated vehicles (AVs) to map, perceive, predict, and plan sequentially to enhance driving safety [2].Another example is Tesla which employs a cluster with 5,760 A100 GPUs to conveniently train their multi-modality neural network with a 1.5 petabytes dataset [3].
The use of AI platforms is not limited to car manufacturing.It indeed made revolutionary changes to traffic monitoring and control systems, and roadside infrastructures.Particularly, web-based high-performance computing (HPC), and vehicular edge computing (VEC) servers with graphics/tensor processing units (GPU/TPUs) have made volume data aggregation and processing, more feasible than ever [10].
Despite these technological advances, driving safety still remains one of the key challenges of today's society.Statistics show that the mortality of motor-vehicle related injuries has been almost constant in years 2015 to 2019 in the US [11], while the car crash fatalities even have been increased since the COVID-19 pandemic started [12,13].This is a global is-sue, and about 1.3 million people die by car accidents worldwide, and millions are injured every year according to the world health organization (WHO) [14].These statistics reveal that modern technology has not yet been fully utilized to prevent avoidable accident casualties and fatalities.
Driving safety is pursued from different perspectives by research communities, as shown in Fig. 1.New safety and warning systems are under design and development.Examples are applying eye-tracking technology to assess drivers' distraction and fatigue [15,16], and onboard collision avoidance systems to offer safety alerts and active brake upon detecting a danger [17,18,19].Another research direction is exploring the casualty of incidents based on crash surrogate events (as the measures of accident proximity) from long-period naturalistic driving data [20,21].These works show that the expected number of crashes to occur during a specific time period is related to the number of observed surrogate events and crash-to-surrogate factors [22].A triggered event often is determined by a set of key parameters known as surrogate safety metrics (SSM) designed for human-driven cars [23] as well as AVs equipped with autonomous driving system (ADS) [24,25].We also provide a comprehensive review of SSM in the Appendix A.
The community also has taken advantage of the recent developments in DL-based image/video processing that yield superior performance far beyond the conventional methods.DL methods have also enabled developing well-annotated volume datasets (e.g., highD dataset 2018) [26] that in turn led to developing even more powerful video processing methods for autonomous and safe driving applications, such as vehicle detection [27,28], plate recognition [29,30], traffic sign classification [27,28], lane detection [31,32], and abnormal driving detection from surveillance video [33,34].
To the best of our knowledge, most current methods favor the investigation of individual and independent crashes based on the extracted incident-level safety metrics and disjoint safety events.Therefore, they are not well-positioned to make relations between the crash distributions and the dynamics of the entire traffic flow.
Fig. 2 demonstrates a merge scenario, where individual safety metrics may not be capable of capturing overall safety risks, hence can yield misleading results.Gray rectangles in this figure represent the normal traffic flow of the highway, and vehicle v1 intends to join the traffic.Figs.2(a) and 2(b) present an aggressive merge before and after the joining epochs, while Figs.2(c) and 2(d) show a safe merge.Let's investigate this scenario using the time-to-collision (TTC) metrics, which is one of the most commonly used safety metrics for safety analysis, especially rear-end crashes [38,39].Calculating TTC for the leading and following cars (v1 and v2) will favor the aggressive join, since a faster merge leaves more reaction time for the fol-  shows unsafe (aggressive) join before and after the merge.This is considered favorable by the following car (v2), since it provides a higher TTC, while disrupting the overall highway traffic stability (c).The bottom row shows the safe join by v1 after yielding the traffic flow before (d) and after the merge (e).Although this merge provides a lower TTC for the following vehicle on the ramp (v2), it is advantageous from the traffic stability point of view (f).This can be reflected in the TTC of the car following v1 after joining the highway (v3).lowing car v2, and hence appears safer from v2's perspective.However, it causes more risk to the following vehicle on the highway after joining the traffic (v3 in Fig. 2), hence disrupting the stability of the traffic flow.This simple scenario shows how individual two-car investigations can lead to misleading results.Indeed, traffic flow can represent a complex dynamic system with many factors interacting with one another, requiring an overall network-level analysis.For instance, a traffic blockage in one intersection can influence the traffic volume (and hence the traffic safety) of alternative nearby routes.
It is noteworthy that simply extending the results of individual crash analyses does not fully address this issue since the safety metrics are calculated only for vehicles that involve in crashes, while our perspective is profiling the overall traffic safety through network-level metrics.
One may argue that one easy solution would be averaging individual safety metrics among all vehicles.However, we show that elegantly designed network-level safety metrics can provide further information about safety risks.For instance, we conjecture that the spatial distribution of vehicles on the road correlates with the crash probability.In this respect, we group vehicles into clusters and calculate cluster-wise TTC metrics (TTC-CV) as elaborated in section 3.Then, we find the correlation between the TTC metrics and crash rates.As shown in Table 1, the proposed TTC-CV metrics show three times higher correlations with crash rates compared to that of the individual TTCs averaged over all vehicles, E(TTC).This is consistently true for all crash types and rear-end crashes using three measurable correlation methods, Pearson, Spearman's, and Kendall.The importance of macro-scale analysis of crash data has been recognized by some researchers [40,41,42,43,44].However, most of these studies investigate the geo-spatial distribution of crashes and their consequences with limited insight to finding diverse causes of crashes.More specifically, they emphasize geographical mapping of traffic properties (e.g., volume, density, congestion condition, etc.) as well as the road network topology.The high-resolution micro-level driving behaviors, such as multi-agent trajectory prediction [45], motion planning [46] have also been studied.These analyses predict crash probabilities and safety factors but they didn't try verifying their results with readily available crash reports.
A brief review of these methods is provided in Section 2. It is noteworthy that our analysis nicely integrates with these studies, since these methods find the baseline crash probability for each road section, whereas our method quantifies the modulated crash probability (the amount of increase/decrease) based on the temporal safety profiling of the traffic flow.
In this work, we offer a generalizable framework for finding meaningful correlations between representative traffic safety indicators and crash probability geo-distribution through integra-Figure 4: The proposed network-level traffic safety analysis framework includes multiple modules: (i) video processing that includes video preparation (e.g., video stabilization, noise reduction, and personal information masking), trajectory extraction, geo-mapping with perspective translation, and network-level safety metric extraction, (ii) crash analysis which includes crash temporal and spatial mapping, crash distribution (histogram), and crash type mapping (representation), and (iii) temporal and spatial association analysis.tive analysis of crash reports and traffic video captured by the roadside infrastructure.Our contribution is two-fold.First, we define a set of network-level safety metrics (NSM) from an external observer's perspective that captures the inherent relations between local traffic flows and gauge the overall safety profile of the traffic network (Fig. 3(b)).These metrics are not claimed to be comprehensive, and are presented only to show the utility of such metrics for traffic analysis as a proof-of-concept, and can be extended to a more comprehensive list.Secondly, we provide a formal association analysis to assess the contribution of each safety metric in mediating the crash frequency.This is done by investigating the spatial and temporal correlation between the crash data points and the traffic disruption represented by the proposed safety metrics.The results of our framework can be used for developing online traffic advisory systems, crash explanation, risk prediction, and traffic optimization.It is noteworthy that crash severity analysis is another legitimate problem which is out of the scope of this paper.We have to acknowledge that our metrics may be not directly used to analyze crash severity since crash severity is more heterogeneous by nature and involves more confound factors beyond the developed NSMs.
The overall pipeline of the proposed integrative framework is presented in Fig. 4, which includes two parallel processing modules for crash analysis and video-based safety metric extraction followed by an association analysis step.

Related Work on Network-level Analysis
Traffic analysis can be performed at different levels.In micro-level analysis, typically an individual incident is ana-lyzed by a set of fine-resolution parameters such as the involving vehicles' geometry and motion dynamics as well as the local road and traffic flow properties, based on the sensor readings and captured imagery.A comprehensive review of micro-level analysis of crashes, DL-based driving behavior modeling, AIplatforms for driving safety enhancement have been reviewed in [47,45,46].
In macro-level analysis, the traffic flow is considered a dynamic network with inherent relations among network nodes.This view is adopted for network structural analysis, traffic forecasting, abnormal pattern detection, and global traffic safety analysis.For instance, [48,49,50,51] consider traffic as an application of complex network theory, where the network dynamics can be represented by the collection of small-world networks [52] and random scale-free networks [53].A smallworld network is defined as a network constructed with a high clustering coefficient with small average geodesics, namely the pairwise shortest path lengths.In other words, most network nodes are expected to be accessible with a low number of hops, (e.g., logarithmic in the number of nodes).This model best fits a collection of lattice-shaped local neighborhood roads linked with a few highways.
A scale-free network is a network whose degree distribution obeys the Power-law, meaning that there exist only a few nodes with higher degrees (such as downtown or traffic hubs).Urban traffic can be modeled by such networks [48].
Some other works [54,55,50,56,57,58,51] analyze the network structure using complexity networks theory to evaluate, design and optimize traffic networks with sustainability and maintainability.Detecting network bottlenecks with poor connectivity and high congestion vulnerability is studied in [59,60,61,62,63,64,65].The authors of [66] provide a comprehensive review of continuum models that consider the traffic flow as a compressible fluid and employ some physical knowledge to explain the real-world phenomena in traffic.It is noteworthy that some recent works [67,68,69] address mixed traffic taking autonomous vehicles (AVs) into consideration in their analysis.
Traffic safety can be improved by more accurate traffic forecasting, namely by predicting the future properties of the traffic flow based on current/historical features.Traffic risk modeling and prediction can provide hints and guidelines on traffic management to minimize factors that elevate the crash risk.Specifically, [70,71] transform the road network to 2D grids and then apply convolutional neural network (CNN) to predict the traffic crowd flow.The traffic flow is modeled as a diffusion process on a directed graph and deploys a diffusion convolutional recurrent neural network (DCRNN) to learn the spatio-temporal features of traffic based on the historical data and road structure [72].A method called Deep Transport is proposed in [73] which combines CNN and recurrent neural network (RNN) architectures equipped with an attention mechanism to predict traffic volume.Recently, graph neural networks (GNNs), a class of DL networks performing inference over arbitrary graphs are proven to yield superior performance in predicting traffic-related parameters [74,75,76,77,78].
Crashes can also be viewed as severe disruptions in the network flow.Therefore, many papers have focused on abnormal network pattern detection, such as inferring abnormal patterns caused by unexpected events (e.g., natural disasters, serious car accidents, traffic control) [79,80,81].For instance, interpreting crashes by analyzing the frequency of irregular patterns over traffic networks is considered in the following works.The role of road factors in characterizing the severity of incidents using logistic regression with chi-squared test is presented in [82].Kernel density estimation (KDE) is used in [40,41] to find the spatio-temporal patterns of traffic accidents, and rank them based on their statistical significance.Negative binomial and Poisson models are used in [42] to identify traffic factors that contribute to the crash frequency.Crash prediction based on random forest (RF), gradient boosting decision tree (GBDT), and Xgboost is performed in [43].A spatial econometric model is adopted to find associations between the road links and incidents [44].
The above-mentioned works provide insightful results for different traffic problems by network-level analysis of traffic flow and crash statistics.However, they still suffer from a few drawbacks.Some works are devoted to explaining the traffic flow and crash distributions using deep learning models.Although successful from modeling and prediction perspectives, these methods lack the interpretability and generalizability features.Moreover, due to the difficulty of creating crash scenarios, the micro-level analysis is hardly verifiable, and most works settle with verifying the results with virtual simulators such as car learning to act (CARLA) [83] and simulation of urban mobility (SUMO) [84], ignoring the valuable information exploitable from a rich set of well-documented crash data.Furthermore, the utilized safety metrics are appropriate only for in-dividual crash analysis, hence fail in modeling the inherent and complex relations among network nodes.In this work, we make a connection between the global analysis and deep analysis of individual incidents by introducing newly-defined networklevel metrics.The integrative analysis of traffic video and crash data enables us to look for statistical relations between traffic properties and crash risks and drawing general conclusions on traffic safety enhancement.It is notable that there is a gap between the association analysis using data-driven methods and the real causality identification studied in [85].
Finally, we note that our method does not replace the microlevel and macro-level analyses, but complements them.It extends the notion of safety metrics to more insightful networklevel metrics for global crash analysis while profiling the overall traffic flow safety that can fine-tune the baseline crash probability obtainable from macro-scale analysis of crash reports.

Proposed Metrics for Network-level Analysis
Conventional safety metrics are defined for scenarios where only two (or a few) vehicles are involved.A summary of the most commonly used safety metrics is provided in Appendix A, for the sake of completeness.In this appendix, we also introduce a taxonomy for safety metrics based on the level of access required to ADS data, when calculating these metrics for self-driving cars.In this paper, we introduce a set of NMS that can be used for the overall and long-term safety assessment of traffic flow.For instance, the composition of traffic (e.g., the ratio of trucks to all vehicles) can contribute to the frequency and severity of collisions.Likewise, the overall variations of car velocities on the road can reveal information about potential risk factors.A list of the proposed network-level safety metrics is provided in Table 2.
Time-to-Collision-Cluster Variation (TTC-CV) is defined to evaluate the relative velocity of car clusters that can pose safety risks.TTC is perhaps the most commonly used safety metric that evaluates the risk of read-end crash by quantifying the time of the following car colliding with the leading car if they both retain their current speeds.We develop a new metric that extends this metric to car clusters since vehicle clustering naturally occurs on the road.We also conjecture that the instantaneous geo-distribution of the cars on the road can play essential roles on crash rates.For instance, car platooning is considered an important feature for autonomous and connected vehicles.For such scenarios, due to the coordination between platooning vehicles, inter-platoon crashes are rare and clusterbased TTCs can be useful.
Our approach is clustering vehicles based on a predefined threshold and assess the relative mobility of car clusters.More specifically, we use the down-sampled version of the traffic video, e.g., at rate 1 FPS) to cluster the cars based on their pairwise distances.If S is the set of cars in the current video frame, then a cluster C i = {n j } ⊂ S is defined so that for every n j ∈ C i , there exists at least one node n k ∈ C i , j k with d(n i , n j ) ≤ d C and likewise any car n l satisfying d(n k , n j ) ≤ d C for some n j ∈ C i must be a member of C i .Here, d(n i , n j ) is the Euclidean distance between vehicles n i and n j , and d C is a predefined threshold.The clusters are non-overlapping and we have C i ∩ C j = {} for all i j.Then, the cluster C i is considered as a virtual point object at the centroid of the cluster, i.e. l(C Here, l(n), v(n) represent the location and velocity of node n, and, |C| is the cardinality of set C (i.e. the number of cars in cluster C).The TTC of C i is calculated with respect to the potential collision point with the latest leading cluster C j moving at a lower speed v(C j ) ≤ v(C i ) as follows: For segments with crossing road segments (intersections and merging points), we use the stationary intersection point as the potential collision point when calculating cluster TTCs.Also, note that cluster TTCs are calculated using the original video with high FPS 30.The cluster TTC reduces to the regular inter-vehicle TTCs if the threshold d C is selected close to 0, so each car becomes a cluster.Next, the coefficient of variation of cluster-level TTCs for frame j, is calculated as: as an instantaneous network-level collision risk factor for the road segment covered by video frame f j .Here, N v and N C are the numbers of vehicles and clusters in the frame, and ρ v = N v /N c is the average number of vehicles in each cluster used to emphasize higher risk for more crowded clusters.This metric is more robust against outliers and extremums (which often occur in computing TTC) compared to other statistics of cluster-level TTCs, including min(CT TC), mean(CTTC), or max(CT TC).
Individual Velocity Variation Rate (IVVR) is defined as the variation of velocities for each vehicle in a specific zone or time interval.We define it as where v max i , v min i , v av i are the minimum, maximum, and average velocities of vehicle i.The higher values of this metric means that on average each vehicle changes its speed more often by accelerating and decelerating.This can be due to the road profile, density of intersections and exits, traffic volume, or road conditions.
Overall Velocity Variation Rate (OVVR) is defined as the variation of average velocities among vehicles in a specific zone or time interval.OVVR is defined as where v av i is the average velocity of vehicle i, and v av = N i=1 v av i /N is the average velocity of all vehicles.Similar to IVVR, this metric can be associated with crash rate in specific highway sections.
Over Speeding Rate (OSR) is defined as the rate of overspeeding vehicles as follows: where v L is the speed limit, and I(x > y) is the indicator function with I(x > y) = 1 for x > y and I(x > y) = 0 otherwise.Speed limits are typically set based on a standardized set of national guidelines, taking into account road geometry (e.g., radii of curves, sight distance, weather conditions) and the location profile (e.g., residential versus rural areas).A high OSR, when associated with a high crash rate, may indicate the need for taking more warning and prevention measures to avoid over-speeding, noting that over-speeding can be an important contributor to crashes.On the other hand, high OSR, when it does not correlate with a high accident rate, can indicate that speed limits could be considered for potential increase without compromising traffic safety.The aforementioned traffic metrics can be characterized by processing the roadside cameras or by crowd-sourcing and accumulating information provided by vehicles' dash cameras.
Traffic Composition Indicator (TCI) is defined to gauge the diversity of vehicle types in specific road sections.For instance, it is known that a higher density of trucks on the roads can correlate with the frequency and severity of road accidents; hence trucks are prohibited in some road sections in highlypopulated areas [86].This can be due to trucks' larger deceleration inertia, more frequent break failures, and lower maneuverability levels.In general, if vehicles classified into classes c = 1, 2, 3, . . ., C, then the traffic composition can be defined as where N c is the number of vehicles in class c, and the metric is defined similarly to the Jain fairness index.This metric ranges from 1/C for the most unbalanced composition to 1 for an equal number of vehicles of each type.If one is interested in evaluating the rate of a specific class like trucks to all vehicles, the following metric can be used: When classifying cars into two classes c = 1 for trucks and c = 2 for non-trucks, these two metrics are related as Extend the conventional TTC to cluster level; Evaluate the global risk of the traffic flow at the network-level.
Reflects the instability of traffic flow by car speed variations and crash rate; Can partially offer network re-design suggestions Overall speed variation of vehicles; Defined similar to IVVR for overall traffic speed variation; Not easily affected by outliers  Lower TCI values mean that most cars are of the same type with lower risks.This metric can be estimated by processing roadside videos but requires high-complexity learning methods for vehicle detection and classification.
Normalized Traffic Density (NTC) is defined as the density of vehicles on road sections as where l i is the length of vehicle n i , N l is the number of lanes, and L is the length of the road section.We normalize the vehicle length to capture the effect of vehicle's size.This parameter can be easily extracted from roadside videos by video processing and vehicle detection.Higher NTC values are expected to be associated with higher crash rates, and may raise the request for traffic load balance strategies.Traffic Recovery Time (TRT) is defined as the time required for traffic flow recovery after incidents.The system of vehicular flow can be considered as a non-equilibrium system of interacting particles [87], and the instability of a flow-free state is induced by the collective effects of the increase of fluctuations.
It is known that any traffic event can suddenly lead to a jamming state, and TRT reflects the time span from an event epoch to the time point the status changes to free flow.This time is expected to be much shorter than the interval between the consecutive events for flow stability.TRT can be defined as: where N e is the number of events in the monitoring interval, and t b (i) and t r (i), respectively, denote the event start epoch, and the flow recovery epoch for event i.This parameter can be easily obtained from roadside videos or by crowd-sourcing the  position information obtained from dash cameras.The summary of the proposed network-level safety metrics is presented in Table 2.We believe that further efforts are needed to develop a more complete list of network-level safety metrics.

Data acquisition
In this work, we use six camera feeds collected by the Arizona Department of Transportation (ADOT) roadside infrastructure.Each camera covers one segment of highway I-10 and records five 2-hour MP4 videos with resolution 1280x720 and FPS 30 (each video file is about 8 Gigabytes).Six exemplary covered highway segments along with the approximate camera locations are shown in Fig. 5.

Video preprocessing
In order to calculate safety metrics, we need to extract motion trajectories of vehicles from the video files.To this end, we integrate a tracking algorithm called simple online and realtime tracking with deep association metric (DeepSORT) [88] with the you only look once (YOLO)-v5 [89] detector to obtain trajectories of labeled objects.DeepSORT is a real-time multiobject tracking algorithm based on Kalman filtering and Hungarian algorithm, which can consider both bounding box parameters and appearance simultaneously.Known for its speed and accuracy, YOLOv5 is an advanced proposal-free detector adopted to sustain the high-quality achievement of tracking.The centroid of detected bounding boxes is used as the position of the objects.This combination offers real-time tracking (∼40 FPS with a GeForce RTX 2070 GPU) with acceptable accuracy (>90%).This setup is sufficient to perform experiments on certain road segments.To the best of our knowledge, as the computer science and industry thrive, more lightening and accurate algorithms with hardware acceleration would arise that could allow widespread deployment of our framework on RSU of large-scale road networks.Another advantage of this approach is tracking objects even with long occlusion periods, a frequent issue in multi-vehicle tracking.
The extracted trajectories are in the pixel domain from the camera's perspective, hence the exploited distances and velocities are not proportional to real values.In order to extract safety metrics from trajectories, we translate the position information (u, v) from the 2D pixel domain into 3D GPS positions (x, y, z) using perspective projection, as shown in Fig. 5 and Fig. 6.To this end, we solve the following projection equations for a set of key points with known GPS positions.Considering a flat surface with no elevation change, we can skip z in our calculations.
where, (x/λ, y/λ, z/λ) denotes the GPS positions of the pixel after transformation from the pixel index values (u, v) with λ being the scale factor.The optimal transformation coefficients a 11 , ..., a 43 are obtained by applying the least squares estimation (LSE) to the set of selected keypoint pairs with known GPS positions.Under linear transformation (no camera edge distortion) four keypoints are sufficient to recover the projection matrix, but a higher accuracy can be achieved using more key points.
The obtained trajectories represent noise-like fluctuations mainly due to the drift in the bounding boxes position around the object.We utilize a Savitzky-Golay (SG) filter [90] to smooth out the trajectories before performing the subsequent association analysis (Fig. 7).

Crash data
The crash data is also provided by ADOT, which includes crash incident details in terms of date, time, location, collision type, etc. for 5-years, from 2015 to 2019.we extract the data for covered segments and use the most dominant crash types, including the rear-end, side-swipe, and all-types, since other types like head-on, angle, and rear-to-side crashes are sparse with not enough samples for association analysis.Figs. 8, and 9 represent the geo-distribution of crashes and the crash statistics of the six segments, respectively.The temporal analysis aims to verify the utility of the proposed metrics in predicting crash count in each road segment for a given time interval.The spatial analysis investigates the generalization of the identified relations to other road segments with similar conditions.To this end, we split time into 1-hour intervals.For each interval, we calculate the average of safety metrics extracted from the traffic video for a specific road segment.Likewise, we take the average of crash counts for the same interval and road segment over the 5-year period.The validity of association analysis relies on two assumptions (i) different time intervals (e.g., [8:00 am-9:00 am] and [12:00 pm-1:00 pm]) are statistically different, and (ii) the crash counts over the same time intervals are statistically identical across different dates.The results in Table 3 present the contingency Chi-square test, after applying Yates's correction, to determine the statistical difference between the entire population and the sampled subset.For this test, we randomly sample the crash dataset and select 10% of the dataset, then compare the crash count per 1-hour time intervals between the entire and the sub-sampled dataset.Examples of this count for six 1-hour intervals and three crash types are presented in this Table .The achieved pvalues are higher than 0.6, meaning that there is no significant difference between the subset and the entire dataset (usually, p-values larger than 0.05 are sufficient to accept the null hypothesis).Therefore, the 5-year average of crash reports can be used for temporal analysis with a 1-day traffic video.To avoid sampling bias, we repeat this test for 1,000 different subsamples and present the average results in Table 4, which shows the same trend.We also performed similar tests over different weekdays, months, and years and obtained similar results suggesting that crash counts over time intervals are consistent among weekdays, months, and years.We observed the same consistency among different road segments as shown in Fig. 9.
To investigate the statistical difference between hours, we apply a one-way Chi-square test over different intervals averaged over the entire dataset.We repeat the test independently for   all-types, rear-end, and sideswipe crashes.The P-values are extremely small for all cases suggesting that different intervals are statistically different, as noticeable in Fig. 9.

Safety metrics extraction
The utilized trajectory extraction algorithm provides vehicle IDs, and the objects' locations (in terms of bounding box corner points per video frame), and the category of each object.We model vehicles as point objects located at the center of the bounding box.The extracted trajectories, after proper handling such as perspective transformation, denoising and smoothing, and filling the missing values by linear interpolation and excluding transitional and stationary non-vehicle objects, are used to compute the proposed metrics.
This information is sufficient to calculate most of the metrics including IVVR, OVVR, and OSR metrics which are solely based on the position and velocity of the vehicles.More particularly, the position of vehicle n i at time t is x i (t) = x i1 (t) + x i2 (t) /2, y i (t) = y i1 (t) + y i2 (t) /2, where x i1 (t), x i2 (t), y i1 (t), y i2 (t) represent the corner points of the corresponding bounding box at time point t.The instantaneous velocity at time t is simply the derivative of the position, i.e.
with α being a scale factor into the desired metric unit and dt = 1/FPS is the time step.Higher order derivatives, and joint smoothing of the positions and velocities using methods such as Kalman filtering can also be used for higher accuracy, which we avoid in this work to keep the computational complexity at a reasonable level.
For some other metrics, such as TCI and NTC, we also need the vehicle counts and types.In this work, we use only two classes of vehicles: small vehicles (e.g., cars, SUVs) and large commercial vehicles (e.g., trailer trucks, busses).Since the dimensions of each object are already provided by the object detection stage (after the perspective translation and scaling), we use the object dimensions for object classification to incur minimal additional computational cost to the system.Noting the fact that the average length of personal cars and trucks is respectively about 4.5 m and 22 m [91,92], the classification results is fairly accurate, except for the overlapping and temporarily occluded objects.For most of these objects taking the average over consecutive frames solves the transitional issues.There exist some prior work on fine-resolution vehicle classification into multiple subtypes (sedans, SUVs, truck, minivans, etc.) [93,94,95,96], which we skip here in the advantage of low computational complexity for real-time monitoring systems.A summary of safety metrics is shown in Table 6.

Association analysis
To investigate the correlation between each metric and the crash rate, we use three correlation coefficients, defined as where x, cov(x), σ(x), R(x), and sgn(x), are the expected value, covariance, standard deviation, rank, and sign of x, respectively.Pearson's coefficient quantifies the strength of linear correlations and is most appropriate for normally distributed variables, whereas Spearman's correlation is a rank-based method that does not assume linearity or normality of the variables.Kendall offers a rank correlation based on the concordance of the pair of observations, which is less informative but more robust than the other two methods.
We used correlation methods to find pairwise relations between the individual NSMs and crash rates.We can also use regression models to evaluate the collective prediction power of NSMs in predicting crash rate (not crash incidents).
It is noteworthy that crash counts can be used as approximate surrogate for risk probability, therefore the discrete values can be as the quantized [and noisy] versions of the continues-valued crash probabilities.This is more reasonable when crash counts are larger numbers (for 5-year crash data).
More specifically, we have y i = x T i β + ε i , where x i = [x i1 , x i2 , . . ., x iM ] is the set of M extracted safety metrics for a given time interval, and y i is the crash count in the same Note that the solution of linear regression model y i = x T i β+ε i (or its compact format y = Xβ + ε) using the ordinary least square (OLS) is β = (X T X) −1 X T .From the frequentist's perspective, the true β is deterministic but unknown.Based on these assumptions, β is normally distributed as [97]:

Metrics
This justifies the validity of using the subsequent statistical analysis for the relevance of normally distributed model parameters.
• Adjusted R-squared: This test is commonly used to validate the goodness of fit for linear regression models.Rsquared statistics is defined as where ŷi and ȳ are the estimated value and the average of the outputs y i .Obviously, more predictors can result in stronger models in the presence of sufficient data samples.In order to account for the number of predictors and the trivial gain for using more predictors, we use Adjusted Rsquared which increases when the new predictor improves the model performance more than would be expected by chance.It is defined as: where n is the number of samples and p is the number of predictors.
• F-test: This test evaluates the significance of the model by evaluating all observed variables simultaneously.More specifically, we have the following null hypothesis and alternative hypothesis: H a : Not all β j are zero.
The F-test statistic can be computed by: If the achieved p-value is less than the given significance level α, we reject the hypothesis of all model parameters being zero (irrelevant).
• Shapley Value: This concept is borrowed from Game Theory and can be used to assess the obtained value of a player (predictor) x i when combined with all other permutation of preceding players S in Coalition games [98].In our case, NSM predictors are the players, and the value of coalition S is the prediction power of a linear model constructed using these features.More specifically, we have where X = {x 1 , x 2 , ....x M } is the ordered set of NSMs, and v(S ) is the accuracy of linear regression model made by features x i ∈ S .Here, φ(x i ) quantifies the added accuracy in terms of adjusted R-squared score of the model made by a set of preceding features S after x i joins the coalition, v(S ∪ x i ) − v(S ) averaged over all computations of S with preceding predictors.

Results
The association analysis is performed for six cameras covering six disjoint road segments as shown in Fig. 5. Video is collected for 10 hours, 8:00 am -6: 00 pm, on different days.
In this analysis, each data point is a pair (x i , y i ) with two components: x i the average of extracted safety metrics during a 10-min interval, and y i the crash count during the same interval averaged over 5 years of crash reports.Some data points are excluded due to camera issues (camera off, covered, midoriented).

Crash risk assessment
We perform this analysis to show that monitoring RSU videos and extracting NSMs can be used for risk assessment by predicting expected crash counts.To this end, we build a fullpredictor model which is a linear regression model to predict the crash count based on all NSMs using 5-fold cross validation during a given interval.The prediction results are presented in Fig. 11, Fig. 12, and Table 7. Fig. 11 demonstrates a high alignment between the true and predicted rear-end crash counts using the full-predictor model.The same results are shown for all-type crash counts for all six segments as a scatter plot in Fig. 12, where most of the data samples concentrate around the unit-slope line (true value=predicted value).The results verify the usefulness of the proposed approach of using safety metrics to predict crash rates.
A more formal statistical analysis is provided in Table 7 using the metrics discussed in section 3 to validate the relevance of the constructed linear regression model.The results are based on the average of the 5-fold cross-validation (instead of the best results), which further confirms the validity of the developed models.It can be seen that the P-value is less than 0.05 for all crash types (using three different tests), which shows that the Table 7: The average performance of all-predictor model relating the crash rate to all predictors (i.e.safety metrics) by 5-fold cross-validation.Normalized MSE (N-MSE) is calculated as  combination of all predictors (NSMs) makes a non-zero (significant) contribution to predicting crash rates, with any reasonable significance value.Similarly, the normalized MSE for both linear regression model as well as Poisson regression model are in an acceptable range for all-type and rear-end crashes.The results suggest that predicting crashes based on NSMs is more relevant for rear-end crashes than the sideswipes.This is justifiable since most metrics (such as TTC, TTC-CV, IVVR, OVVR, OSR) consider longitudinal motions while sideswipe crashes highly depend on latitudinal motions.This reveals the need for developing a richer set of network-level safety metrics that are capable of predicting sideswipe metrics.such as metrics that indicate zigzag driving can be helpful.Also, sideswipe crashes seem to be more complicated in nature and potentially depend on other factors such as human mistakes.

Temporal correlation between NMS and crash count
Now that the collective power of NSMs in predicting crash rates is established, we develop further analysis to estimate the relevance of each metric individually.To this end, we compute the individual correlation of each safety metric during a given interval with any of the crash types for each segment during the same time interval, using Pearson, Spearman, and Kendall correlation measures.To further highlight the importance of the achieved correlations, we compare the results against two reference values, (i) the correlation between the crash types and the traffic volume, and (ii) the correlation between the crash count and individual TTCs averaged over the entire traffic, E(TTC) discussed in Section 1.The result are presented in Table 8 and Fig. 13, showing that the proposed metrics exhibit a high correlation (0.25 ∼ 0.5) for rear-end and all-type crashes, which is much higher than the baseline correlation between the traffic volume and crash counts (0.01 ∼ 0.08).Also,  it is clearly seen that the achieved correlation for TTC-CV is in range (0.09 ∼ 0.38), much higher than that of E(TTC) in range (0.004 ∼ 0.08).This observation supports the idea of developing new network-level safety metrics.

Spatial analysis
So far, we showed the high correlation between the NSMs and crash rates of different types, by investigating each segment individually.To investigate the generalization of the proposed method, we perform cross-segment analysis.Specifically, we perform the same correlation analysis to all combinations of 2 to 5 segments, as shown in Table 9. the results are consistent with Table 8 that shows a reasonable consistency across segments.
To further investigate that the spatial generalizability of the results, we perform a cross-segment analysis.We construct a linear model (to predict crash counts based on NSMs) for five segments, and evaluate the model for the remaining segment (out-of-segment validation).Then, we repeat the test for all other combinations, so that each segment is tested once.The results are shown in Table 10.

Coalitions of predictors
It is known that the predictive power of each feature can depend on the presence of other features, due to inter-feature linear and non-linear correlations [98].To account for this fact, we calculate the Shapley value for each metric as discussed in section 4.5.Here, v(S ) is the value of a coalition S calculated as the adjusted R-squared statistics of a model built using predictors x i ∈ S .The results are presented in Table 11 that show these metrics play a relatively balanced role in the cooperative operation.Therefore, it is advantageous to use the full-predictor model based on all proposed NSMs.
Overall, we made the following observations, some of which require further investigations using more data samples to draw stronger conclusions.i) Monitoring RSU traffic video and extracting network-level safety metrics can be used for crash risk analysis; ii) The models build for dome road segments are generalizable to roads with similar geometry; iii) OSR exhibits a negative correlation with crash count, which is counter intuitive and can reflect the fact that crashes are more likely in busy hours than light traffics.This requires further investigation with larger datasets; iv) traffic composition represented by TCI shows that a more unbalanced traffic flow (extremely different number of small cars and trucks) is more prone to making crashes; v) cluster level analysis of TTC presents credibility in analyzing the traffic safety since it participates in most of the restricted models, vi) sideswipe crashes are harder to predict with metrics driven from longitudinal motions, and perhaps human factors or road geometry play more significant roles in modulating crash rates.

Conclusion
In this paper, we offered a set of network-level safety metrics to assess the overall safety characteristics of traffic flow in a given driving zone.This concept extends the popular notion of safety metrics to network-level analysis.We showed that the proposed safety metrics are highly correlated with crash frequency (temporally and spatially).We conducted a case study in the state of Arizona by integrative analysis of collected video files from the I-10 highway and 5-year crash reports that verify the association between the network-level safety metrics and crash frequency in the same time intervals.More specifically,  metrics that gauge the overall speed variation of vehicles, the traffic composition and diversity of vehicles, the density of traffic volume, and also relative mobility of car clusters are highly correlated with the crash frequency (with p-values much lower than 5% for most scenarios).We also observed that rear-end crashes are easier to predict than side-swipe crashes, perhaps due to the stronger role of road geometry and human mistakes in side-swipe accidents.Also, it shows the need for developing safety metrics that mimic latitudinal motions in addition to longitudinal motions.
The practical use of this analysis is identifying risk factors by constant monitoring of traffic flow using AI-based roadside infrastructures to broadcast warning messages and take more efficient traffic control decisions.Also, traffic control teams can take redesign and long-term decisions to keep the safety metrics in an acceptable range to enhance the overall driving safety on the road.Developing lightweight deep learning models to process traffic video and extract safety metrics in a real-time fashion can pave the road for developing online risk assessment systems.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment
We would like to thank the NSF and the Institute of Automated Mobility (IAM) for supporting this work.We also thank the Arizona Department of Transportation (ADOT) for sharing roadside infrastructure, crash reports, and other resources with us during the performance of this project.Also, the opinions, findings, and conclusions expressed in this manuscript are those of the author's and not necessarily those of the IAM and ADOT.data that can only be obtained with limited access to ADS data.EXAMPLE: ADS DDT Execution (ADE).

White Box Metric:
A metric that allows measurement of data that can only be obtained with significant access to ADS data.EXAMPLE: Perception Precision (PP).
The second rank in the taxonomic hierarchy is the classification rank, which consists of the following (again, an example metric that is described later in the section is included): It is important to note that not all classification rank types link to data source rank types in the current version.For example, the Component Metric is not linked to the Black Box Metric because data at the component level is deemed to be more access to (potentially proprietary) data than is allowed for the Black Box Metric.
The third rank in the taxonomy hierarchy is the Leading/Lagging rank, which relates (in binary fashion) to either the prediction (i.e., before) of a potential operational safety outcome or report (i.e., after) of an operational safety outcome after it has occurred.Operational safety outcomes include conflicts, collisions, an ADS disengagement, or a violation of a traffic law: 1. Leading: An OSA identification of a potential future operational safety outcome.EXAMPLE: Any safety envelope metric.
2. Lagging: An OSA identification of a report of an operational safety outcome.EXAMPLE: Any traffic law violation metric.
The metrics currently being considered for inclusion in the SAE J3237 Recommended Practice1 are included in Table A.12 (although the list may change)).The IAM has focused on Black Box Metrics and Grey Box Metrics as part of the comprehensive set introduced in [23].
We also mention the approach an infrastructure-based (i.e., off-board the vehicle) observer system takes to monitor video and extract OSA metrics.The observer system also called system in the rest of this appendix for convenience, is a terrestrial or aerial monitoring system that collects traffic video for processing from an external observer's point of view.A list of these metrics, along with their brief descriptions, is presented in Table A. 13.
Appendix A.1.Summary of Basic Operational Safety Metrics Maximum Speed (MaxS), when associated with a collision, denotes the maximum speed of the involved vehicles before the crash starts until the full stop (e.g., between time points t 1 and t 4 in Time to Collision (TTC) is a commonly used surrogate measure to define the time of an upcoming rear-end collision between two vehicles, if they continue their current speed (t 2 -t 3 in Fig. A.15).The system needs to observe the relative position and calculate the relative velocity of the two involved vehicles.TTC is computed as: where X i denotes the position, v i denotes the velocity, and the indexes L and F, respectively, denote the leading and following vehicles.The collision occurs only if v F ≥ v L .Usually, we  The thresholds vary by jurisdiction and culture; AD is only involved the subject vehicle; implies the inherent and potential risks of natural driving behaviors; should be included when evaluating ADS.
Table A.13: Summary of operational safety metrics along with their key properties.Noting that '*' denotes the metrics that are not selected by IAM but are basic metrics employed in other papers.Deceleration Rate to Avoid a Crash (DRAC) is defined as the minimum deceleration rate of the following vehicle to avoid a crash to the leading vehicle.To characterize this metric, the observer system should keep track of the relative positions and the relative velocities of the two vehicles.Note that DRAC fails to evaluate lateral movements and is applicable only to scenarios where both cars are on the same lane.Mathematically, DRAC is defined as

Figure 1 :
Figure 1: Different aspects, tools, opportunities, and challenges related to traffic safety.

Figure 2 :
Figure 2: The individual analysis fails to interpret a merge scenario.Vehicle v1 on the entrance ramp intends to join the highway traffic.The top row (a,b)shows unsafe (aggressive) join before and after the merge.This is considered favorable by the following car (v2), since it provides a higher TTC, while disrupting the overall highway traffic stability (c).The bottom row shows the safe join by v1 after yielding the traffic flow before (d) and after the merge (e).Although this merge provides a lower TTC for the following vehicle on the ramp (v2), it is advantageous from the traffic stability point of view (f).This can be reflected in the TTC of the car following v1 after joining the highway (v3).

Figure 3 :
Figure 3: Different levels of traffic safety along with their regions of interest for (a) individual safety analysis from the vehicle's perspective using standard safety metrics, (b) global safety analysis from an external observer's perspective using the proposed network-level metrics, and (c) local analysis of traffic clusters.
Indication of speed limit violations; Can offer suggestions for network design TCI 2 f1 f2) , two classes Flow composition indicator; Can be associated with elevated crash risks NTC NTC = li Nl * L Consider vehicle shape; Reflects traffic density; Can be associated with elevated crash risks TRT T RT = 1 Ne Ne i=1 |t r (i) − t b (i)|A simple and reasonable way to evaluate severity of the accidents; Can also take traffic stability into consideration

Figure 5 :
Figure 5: Six road segments (equivalently six selected areas for perspective transformation) are shown by red squares, and the camera locations are shown by green triangles.

Figure 6 :
Figure 6: Perspective transformation for fixed camera view of Segment 6: (a) Selected area covered by an RSU camera field of view, (b) GPS view of the selected area, (c) aligned perspective of the selected area, and (d) lane segmentation of selected area.

Figure 7 :
Figure 7: Some smoothed trajectories by SG filter and their original form.

Figure 8 :
Figure 8: Geo-distribution of crash data is represented as a heatmap.

Figure 9 :
Figure 9: Crash count for six segments.The temporal trends for segments are consistent.

Figure 11 :
Figure 11: True and estimated rear-end crash counts for two segments using the full-predictor model.

Figure 12 :
Figure 12: Estimated crash count versus ground truth for all segments.The results are for unseen samples in the validation set using the full-predictor model.Each data point represents a 10-min interval.

Figure 13 :
Figure 13: Individual correlations between different metrics and all-types crash counts.
Fig.A.15 MaxS is a simple but effective measure directly related to the severity of the collision.Differential Speed (∆s) is defined as the relative speed between the involved vehicles.It occurs at t 2 in Fig.A.15.The system needs to calculate the speed of the involved vehicles (using methods like DL-based object tracking with or without explicitly extracting the motion trajectories) to determine MaxS and Deltas.
Indicate the ADS ability to determine the safety distances.CIIndicate the subject vehicle is involved in a collision.Active when d lat and d long are equal to 0. Capture instances of collisions as a metric; the severity of the collision can be defined by KABCO scale[103]: K(Fatal Injury), A(Incapacitating Injury), B (Non-Incapacitating Injury), C(Possible Injury), O(No Injury).TLV Indicate the subject vehicle that violates a traffic law Emphasizes that an ADS vehicle must follow existing laws; exceptions are made, for example, when road closures require temporary violations of driving exclusively inside a traffic lane.ABC Indicate the subject vehicle can execute a specific behavior correctly.Indicate the safety of ADS; included in the preliminary list of metrics.ADSA Indicate the ADS is active when executing behaviors.Indicate the safety of ADS; included in the preliminary list of metrics;has some dispute in California[104].HTCDER HTCDER = GT I−CDI CDI One of HTC measurements; manually instructions by officers should still be considered.HTCVR HTCVR = CDI−CCI CDI One of HTC measurements; manually instructions by officers should still be considered.AD Indicate the maneuvers (longitudinal/lateral accelerations) of a subject vehicle exceeds specified thresholds.

Figure A. 15 :
Figure A.15: The space-speed pair diagram of a rear-end collision.t 1 is the time epoch the leading vehicle begins to decelerate (encroachment begin).t 2 is the epoch the following vehicle begins to decelerate.t 1 -t 2 is the reaction time of the driver of the following vehicle.t 3 is the projected arrival moment of the following vehicle (under no deceleration).t 4 is the actual collision moment.

Figure A. 16 :
Figure A.16: The schematic diagram of PET.

Table 1 :
Comparison of the correlation to crashes with a simple average value of TTC and our proposed TTC-CV.

Table 2 :
A set of proposed network-level safety metrics.
12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 a 41 a 42 a 43

Table 3 :
Contingency Chi-Square Test with Yates's correction between the entire dataset and randomly-selected 10% sub-sample.Each 1-hour interval is a data point.

Table 4 :
Contingency Chi-Square Test between the entire dataset and randomly selected 10% subsets (as shown in Table3) after taking average over 1000 different subsets.

Table 5 :
One-way Chi-square test for 1-hour time intervals [averaged over the entire dataset], for different crash types.

Table 6 :
The summary of proposed safety metrics along with the utilized hyperparameters.*:Vehicle's Length is calculated as 4,500mm for cars and 16,000mm for trucks.interval.β = [β 1 , β 2 , . . ., β M ] is the vector of model parameters, and ε i ∼ N(0, σ 2 ) is the zero mean model noise with variance σ 2 .This model is appropriate for continuous-valued outputs.To cover the count data, we also used Poisson regression model, where the crash counts y i are Poisson distributed with mean λ i linearly related to respective NMSs in the same interval, x i1 , x i2 , . . ., x iM .

Table 8 :
The absolute value of Pearson correlation, Spearman's correlation, and Kendall correlation for the metrics across all segments.

Table 9 :
The average absolute Pearson correlation between NSMs and crash count across segments.The numbers in parentheses denote the number of combinations.

Table 10 :
The cross-segment analysis for full model.The model is trained using 5 segments and tested with the unseen segment.

Table 11 :
The Shapley value of the proposed metrics for different types of crashes.
Delivers more accurate information than typical micro-simulation outputs; allows comparison between simulation scenarios or links; some parameters are difficult to capture automatically; only applicable to identical trajectories.MSE Indicate the subject vehicle violate safety boundary of another.Safety boundary: d long,same Provides how to calculate the quantified risky distances; a basis of many assessment methods.PR An action to recover when d long min , d lat min and the safe range of a lat ,a long are violated Adds more information beyond MSDV.