Test Scenario Fusion: How to Fuse Scenarios From Accident and Traffic Observation Data

Scenario-based testing will help to validate automated driving systems (ADS) and establish safer road traffic. To date, most data-driven test scenario generation methods rely primarily on one data source such as police accident data (PD), naturalistic driving studies, or video-based traffic observations (VOs). However, none of these data sources perfectly satisfies all the layers of the six-layer model for the description of test scenarios. Moreover, not all available data sources cover the same location and period of time. Therefore, we fused information from 1,648 scenarios extracted from a German VO with information from 74 scenarios extracted from German PD into a comprehensive new PD* database. Finally, PD* consisted of 74 accident scenarios extended, for example, by variables containing the dynamic information of the VO scenarios. Thus, PD* contained more than 350 variables, whereas PD contained only 269 and VO only 122 variables. For fusion, we followed the Find-Unify-Synthesize-Evaluation (FUSE) for Representativity (FUSE4Rep) process model using statistical matching. Subsequently, we derived three logical scenarios from PD* to test an autonomous emergency braking system (AEB) in a stochastic traffic simulation incorporating driver-behavior models. The quality of the fusion itself was satisfactory, and we propose improving the VO data collection process and observation time to obtain even better results.


I. INTRODUCTION
The second wave of automated driving is starting right now [1], aiming to make road traffic safer by introducing automated driving systems (ADSs).Thus, ADSs should drive more safely than attentive human drivers to achieve safer road traffic [2].One way of proving that ADSs drive safer than human drivers is to compare their safety performance with that of human drivers in selected test scenarios representing real-world traffic.In the early development stage of ADSs, these prospective assessments are performed virtually, for example, by comparing the number of accidents per test scenario for human drivers and ADSs in a large number of virtual simulation runs.To obtain the greatest possible variety of differently executed scenarios over all simulation runs, stochastic simulations can be used, which can vary The associate editor coordinating the review of this manuscript and approving it for publication was Xiangxue Li.
test scenario-specific parameters, such as ego-/agent speed, stochastically per simulation run [3].Moreover, they can also incorporate driver behavior models to compare ADSs with human driving behavior [3], [4].The idea of comparing the performance based on test scenarios emerges from the socalled scenario-based testing approach [5], which attempts to compress the interesting parts of the road traffic in the ADS operational design domain (ODD) into test scenarios.The ODD describes the domain and conditions for which the ADS is developed to operate securely, such as the road layout, speed ranges, and environmental conditions [6].Generally, test scenarios are "a kind of flip book representing the temporal sequence of scenes with different actions (e.g.lane change) and events (e.g.collision)" [7, p. 226].The description of the test scenarios can follow the six-layer model (6LM) [8], specifying, for example, road networks, traffic guidance objects, dynamic objects, and environmental conditions.Additionally, for ADS comparisons with human driver behavior, information on the road users involved, such as the driver's age and driving experience [4], is also helpful in the scenario description.Furthermore, Menzel et al. differentiate between logical and concrete scenarios [9].Logic scenarios, suitable for stochastic traffic simulations [3], [7], provide parameter ranges of the information contained, whereas concrete scenarios, suitable for proving ground tests, provide specific parameters from the ranges.
One way to generate test scenarios is to extract them from the road traffic data [10].Possible road traffic data sources include police accident data (PD), video-based traffic observations using drone/stationary cameras (VOs), and naturalistic driving studies (NDSs) [10], [11].Ideally, one would continuously collect all of these different road traffic data in the corresponding ODD of the ADS to obtain a holistic scenario catalog covering the entire traffic event of the ODD, including accident and critical, complex, and normal driving scenarios [3], [12], [13].In other words, the test scenario catalog should be representative of the ODD traffic event, meaning that the distributions of the scenarios in the test scenario catalog and in real traffic are similar at the time of the ADS testing.However, there are three main drawbacks that hinder the creation of holistic and representative test scenario catalogs: 1) Limited availability: To date, there has been no continuous, representative [13], or standardized traffic data collection covering real-world driving behavior, comparable to the nationwide collections of road traffic accidents by the police [14].So far, various VO [11], [15], [16] and NDS [11], [17] data sets exist, which show, however, only a temporally and spatially limited section of road traffic, for example, two intersections over three months [16].Owing to the large organizational and financial efforts involved, for example, continuous drone VOs of all ODD-relevant traffic sections are unrealistic in the future.2) Differing information: Existing data sources vary in their information content available to describe all six layers of the 6LM and information on road users.For example, the German PD provide conflict situations leading to accidents ("AccidentType") and personal information about the road users involved (e.g., age, driving experience, car details) [18]; however, they do not contain any dynamic information (trajectory, speed course, etc.) of the road users to derive logical/concrete scenarios.By contrast, VOs provide dynamic information for all road users, such as detailed trajectories, traffic volume, and overall traffic behavior, but do not contain any personal information that is not visible to the sensor/camera.Finally, NDSs can provide personal and dynamic information of the road users involved, but are limited to the information perceivable by the sensors from the road users' vehicles participating in the studythe surrounding traffic is only perceived selectively here, depending on the "ego-vehicle-view".
3) Reliable coverage: Even if all data are available, it is almost impossible to reliably identify all relevant scenarios using supervised, unsupervised, or rule-based scenario identification approaches [10].Owing to the large amount of necessary road traffic data, there are always unknown scenarios that cannot be identified by the algorithms used -be it due to missing rules, missing training data or incorrect hyperparameterization.To overcome these drawbacks, we propose two approaches: First, we propose fusing concrete scenarios [9] identified in different data sources to overcome the challenges of limited availability (1) and differing information (2), motivated by the Find-Unify-Synthesize-Evaluation (FUSE) for Representativity (FUSE4Rep) process model [7], [19].The fusion of the scenarios identified from PD and VO can create a representative accident test scenario catalog containing the dynamic and personal information of the road users involved.Using this representative test scenario catalog, the following question can be answered to identify edge cases for testing: Which is the most likely police-recorded traffic conflict leading to accidents, and how should it be parameterized?To create the test scenario catalog, the FUSE4Rep process model uses statistical matching (SM) and provides a general procedure for fusing scenarios identified in road traffic data [7], [20].SM is a technique to combine "statistically heterogeneous samples to construct a new sample that can be regarded as having come from an unobserved joint distribution of interest" [21, p. 6].In contrast to other fusion techniques such as record linkage, the samples to be fused do not require identical observations linked by a unique identifier [21].
Second, we propose to derive logical scenarios from the identified (concrete) scenarios and vary them in a stochastic traffic simulation to overcome the coverage challenge (3), as shown in [3].Thus, the accident scenarios are extended to normal driving and critical scenarios.Therefore, scenarios that were not included in the real-world data sample or were identified in real-world data are revealed using the stochastic component of the simulation.
Building on these two proposals, this study explores the following overall research question (RQ): How can a representative test scenario catalog applicable for testing in stochastic traffic simulations be derived by fusing scenarios identified in PD and VO using the FUSE4Rep process model?To answer this question, we applied the FUSE4Rep process model for the first time completely.The aim was to fuse the scenarios identified in the PD and VO data [16] from selected intersections in Dresden, Germany.To provide a practical example, the scenarios to be fused are relevant for a hypothetical autonomous emergency braking system (AEB), which supports drivers in car-car conflicts.The contributions of this study are as follows: • Demonstration of all necessary steps for fusion, starting with collection of PD and VO data.
• Fusion of the scenarios identified in PD and VO using SM.
• Derivation of three exemplary logical scenarios for application in a stochastic traffic simulation to test a hypothetical AEB.In the following section, we first introduce the existing data-driven scenario generation approaches, the basics of SM, and the FUSE4Rep process model.Next, we demonstrate how to apply the FUSE4Rep process model to the PD and VO data, and how to derive logical test scenarios.Finally, we conclude the paper with a discussion and directions for future research.

II. BACKGROUND/RELATED WORK
The following section provides an overview of the current data-driven scenario generation approaches and SM.Moreover, the section introduces the FUSE4Rep process model.

A. PROCESS OF DATA-DRIVEN TEST SCENARIO GENERATION
To provide an overview of the current data-driven test scenario generation approaches, we extended the data-driven test scenario generation process proposed in [10], [22], and [23], and added the proposed 5 th step of scenario fusion to overcome the challenges of limited data availability and differing information (see Figure 1).
[Steps 1 & 2] After determining the ADS under test (SuT) and specifying the corresponding ODD in step one, step two requires the minimum types of data sources required to generate the test scenarios.To date, only a few approaches have required more than one primary data source to identify the scenarios.When using more than one primary data source, the approaches rely either on a combination of real driving (NDS/VO) and accident data [7], [19], [24], [25], [26], [27] or on a combination of real driving and simulated/synthetic data [28], for example, recorded in driving simulators. [
[ Step 5] Step five (optional) requires the fusion of scenarios identified from different road traffic data sources.
To date, [25] has matched the three-digit accident type (3AT) classification [65] used by the German police to driving situations from a NDS by comparing attributes such as the ego-maneuver using record linkage.However, [25] did not aim to create a statistically valid and representative database.In contrast, [26] fused VO and PD data for one intersection in Germany, Dresden, using SM.Hereby, the SM-process assigned traffic densities and average velocities, extracted from VO data, to past accidents based on time-related variables, such as day, hour, and minute [26].Consequently, the matching in [26] did not create test scenarios, which is also due to the low level of common information, that is, common variables, shared between the two data sources to be fused.Therefore, [7] proposed the FUSE4Rep process model (see Section D) to maximize the common information to be fused by designing the data collection accordingly.Thus, [19] illustrates how to collect and prepare VO data following the FUSE4Rep process model.Consequently, [19] identified scenarios in VO data according to the 3AT classification scheme using a rule-based approach.However, a complete fusion of the scenarios identified in the PD and VO data using SM has not yet been achieved.We want to emphasize, that the fusion of e.g.weather-related information to already identified scenarios using timestamps, as proposed by [24], belongs more to the field of ''data enrichment'' relying on record linkage [21].
[Steps 6 & 7] Finally, steps seven and eight, which are not the focus of this study, aim to generate executable test scenarios by using e.g., parameter sampling or driver models [10], and to evaluate the created scenarios regarding, for example, their criticality [10].

B. APPLICATION OF GENERATED TEST SCENARIOS IN STOCHASTIC TRAFFIC SIMULATION
Scenario-based testing of ADSs in virtual traffic simulations will help compare ADSs with the current traffic event and thus help answer whether ADSs: (a) reduce the occurrence of accidents/critical situations in the scenarios tested, (b) mitigate the consequences of accidents/critical situations in the scenarios tested, (c) and cause new critical situations/accidents [7], [66].
Representing a wide range of possible virtual traffic simulations, we focus on comparing ADS driving behavior with human driving behavior in a stochastic traffic simulation [3], [66].Owing to the stochastic approach, only logical scenarios are required, which allows a new parameterization in each simulation run.Here, the required logical scenarios must contain the following information on the dynamic objects at a minimum: (a) start positions of ego vehicles and agents (i.e., other road users), (b) maneuvers of ego vehicles and agents (for example, going straight, turning left, or turning right), FIGURE 1. Data-driven scenario generation process based on [10], [22], and [23].The proposed 5 th step of scenario fusion to overcome the challenges of limited availability and differing information of data sources is highlighted.The overall process was mapped to the FUSE4Rep process model [7].Steps six and seven are masked out because they are not in the main focus of this study and do not belong to the FUSE4Rep process model.
(c) and speed at start (e.g., min/mean/max) of ego vehicles and agents [3], [7].Interestingly, the 3AT classification, originally developed for PD [64], [65] and applied by [19] to identify scenarios in VO data, helps determine both the start positions and maneuvers of the ego vehicle and agents.As stated, for ADS comparisons with human driver behavior, information on the road users involved, such as the driver's age and driving experience, can also be helpful [4].To date, the scenarios used in [3] for comparing an AEB system to human driving behavior have been selected based on accident statistics and expert opinion, and parameterized based on a literature review and a descriptive analysis of VO data.

C. STATISTICAL MATCHING (SM)
SM originates from social sciences [21] and aims to combine information from independently acquired samples into a new synthetic one [20].The new synthetic sample can then FIGURE 2. Schematic representation of an asymmetric data fusion using statistical matching [26].
be considered as a sample of a virtually joint distribution emerging from the fused samples [21].Therefore, the two samples must already address the same population before fusion; for example, all traffic conflicts in a specific space and time [19].
Figure 2 [26] illustrates the SM for two independently acquired data sets A and B. Both data sets have some variables X 1 , .., X P in common, as well as some missing variables: Data set A misses variable Z , and data set B misses variable Y .In the case of asymmetric SM, the smaller data set -here A, receives information about Z from B. Thus, A is also called "recipient" and B is called "donor".After the fusion, A * represents a new, complete, and synthetic micro-data set containing all variables, X 1 , .., X P , Y , and Z .Depending on the fusion quality, A * can be used for further analysis.
For data sets containing categorical variables, such as PD, non-parametric fusion methods are suitable [67].In the case of fusing PD with VO data, suitable non-parametric fusion methods, such as the distance-hot-deck (DHD) method, use the Gower distance metric [26], [68].Furthermore, (ensembles of) machine learning classifiers are also suitable, such as random forests and their boosted derivatives, support vector machines, and neural nets [26], [69], [70].However, because we want to fuse unique scenarios described by several single variables in this case, the aforementioned single-label machine learning classifiers are not applicable; thus, we focus on DHD, as illustrated in [26].Furthermore, DHD outperforms other hot-deck methods, such as randomhot-deck, and also outperforms random forests in preserving marginal distributions (see the upcoming validity proof), according to [26].We also emphasize that the Gower distance metric can handle categorical and metric variables simultaneously [67].
As presented in Figure 3 [26], DHD aims to replace the missing variable Z of the recipient by comparing the common variables X of B and A. Usually, DHD does not use all common variables X, but only these explaining the missing variable Z the best -this subset of the common variables X MV used for comparing / fusing is also called matching variables (MV).In detail, DHD selects a suitable observation to be fused from B by minimizing the local and global distances of the MVs following a distance function, which here is the Gower distance, suitable for handling categorical and non categorical variables [67].In constrained mode, the DHD selects every observation only once for fusing, whereas in unconstrained mode, the DHD can select every observation multiple times.When two observations have the same distance, DHD randomly chooses.
After fusing successfully, we can check the validity of the fusion using a plausibility check performed by experts [26] by comparing the results with real-world data and by comparing the (marginal) distributions [20].Specifically, the (marginal) distributions of Z in A * , that is fZ , and B, that is f Z , as well as the common distributions of Z with single MVs in A * , that is fZX MV , and B, that is f ZX MV , must coincide [20]: To generate logical scenarios, the preservation of the (marginal) distributions is more important than assigning every observation correctly, which is measured by the "hit rate" [20].This means that for logical scenario generation, which relies on parameter ranges/distributions, the fulfillment of the best possible hit rate, as a strong, never measurable indicator for valid data fusion [20], is not in focus.

D. THE FUSE4REP PROCESS MODEL
In the following, we present the Find-Unify-Synthesize-Evaluation (FUSE) for Representativity (FUSE4Rep) process model in detail, as proposed in [7], for fusing PD and VO data.In addition, we mapped the FUSE4Rep process model to the corresponding steps of the data-driven scenario generation process (see Figure 1).The FUSE4Rep process model incorporates the following four steps [7]: • Find a possible common population and information shared between the PD and VO data.This information serves as the basis for shared variables.Accordingly, using smartphone apps in VO data collection to collect additional common information, such as the incorrect behavior of road users, is reasonable [19].
• Unify the possible common information in the form of shared variables; for example, identify scenarios according to the 3AT scheme in both data sets [19], [64].This also includes an assessment of the crash risk of the scenarios identified in the VO data using generalized extreme value distributions (GEV) [19], [71].
• Synthesize both data sets by SM.
• Evaluate the fusion using statistical indicators, comparable real-world data, or expert opinions.Regarding the data-driven scenario generation process, the step "Find" depends on the "scope definition" (1), the "primary data source selection" (2) and the "primary data collection" (3).In contrast, the step "Unify" belongs to the "scenario identification" (4) step because, for example, determining the 3AT is a type of scenario identification.Finally, the steps "Synthesize" and "Evaluate" belong to the optional fifth step "scenario fusion".

E. RESEARCH GAP AND OBJECTIVE
The overall objective is to fuse PD and VO data for the first time using the FUSE4Rep process model to be able to derive a representative test scenario catalog, applicable for testing in stochastic traffic simulations.Therefore, we consider the following four RQs in detail to answer the overall RQ stated in the Introduction: 1) What are the suitable MVs for fusing the PD and VO data? 2) Does the constrained or unconstrained DHD perform better?
16358 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
3) How good are the final fusion results obtained?4) Is it possible to derive logical scenarios applicable for testing in stochastic traffic simulations?

III. METHODOLOGY
The methodology mainly follows the four steps of the FUSE4Rep process model [7] and is based on the exemplary data fusion presented in [26], as well as on VO data preparation according to the FUSE4Rep process model presented in [19].We use the terms "accident" and "crash" in the following as synonyms, considering only accidents with two participants, excluding e.g.accidents with only one participant due to driving errors.We also use the terms "scenario" and "observation" synonymously: the scenario identified in the PD/VO data is equivalent to an observation stored in the data sets.Finally, we use the following convention regarding the different sets of variables considered: • Shared variables X S : All variables with the same name, content, and categories in PD and VO.
• Common variables X C : All shared variables, which have a similar distribution in PD and VO and thus belong to the same common population.
• Specific variables Z: All variables that are specific to the donor data set, that is, VO.
• Matching candidates X MC : All common variables investigated as candidates for matching variables.
• Matching variables X MV : All matching candidates that are finally selected as matching variables for fusion.

A. FIND
The first step "Find" encompasses all steps to find a common population and common information between the PD and VO data [7] because a shared, unobserved superordinate population is essential for every data fusion.The identified population must be described spatially, factually and temporally [13], [72].Following the data-driven scenario generation process (see Figure 1), the common population depends on the ODD of the ADS to be tested.Thus, the acquisition of PD should be based on the ODD defined beforehand.Subsequently, the VO data set should attempt to cover the same ODD as a random sample [13] of the common population.To define a common population, a link between PD and VO is required [19].According to [73] and [74], every accident is preceded by a traffic conflict, but not every traffic conflict can lead to an accident.In Germany, PD includes the traffic conflict preceding the accident described by the variable "AccidentType" [65]; in some states, AccidentType is also described by the detailed 3AT scheme [65].The 3AT describes the traffic conflict leading to an accident, whereby a conflict is defined as the simultaneous approach of road users to a road location where they may collide [65].Consequently, VO can also reveal traffic conflicts [75], which is a possible link between PD and VO.Thus, the shared, unobserved superordinate population can consist of all traffic conflicts that occur in the ODD in a given period.To provide a With respect to the required sample sizes for the PD and VO data, the size of the PD must be sufficient to encompass the variations in the VO at the desired level of confidence and error tolerance and vice versa [26].
Regarding possible common information, the VO data set/data collection should be oriented towards PD collection, as PD collection is normally standardized and independently performed by the police.For example, in VO, a suitable approach would be to record the incorrect behavior of road users in observed traffic conflicts according to the PD categories with the help of drone pilots conducting VO [19], [76].Moreover, the 6LM [8] can help find information associated with the six layers between both data sets, whereby the ListDB specification already provides a comprehensive codebook of possible variables [76].Table 1 summarizes all ''Find'' steps necessary.

B. UNIFY
The second step "Unify" encompasses all steps to unify the possible common information in the form of shared variables X S between the PD and VO data [7].With respect to the required format, both PD and VO should be tabulated such that each row contains an accident or observed traffic conflict described by means of shared variables X S , as well as variables Y and Z specific to each data set (see Figure 2).As an example of unification, we demonstrate in part how to determine the AccidentType variable according to the 3AT scheme as a shared variable as the link between PD and VO.The detailed instructions can be found in [19] and [64].First, the 3AT should be, if not available, determined for all accidents contained in the PD by manually analyzing the accident descriptions (see Table 2) with expert knowledge or using machine learning approaches [64].
Second, after determining the set of different 3ATs occurring in the PD, the equivalent conflict situations according to the 3AT scheme must be identified in the VO.The prerequisites for identification are the trajectories of TABLE 2. Exemplary accident descriptions of 3AT 201 and 3AT 302 recorded by the police (street names were replaced for easier reading).Pictograms provided by [65]."W" in pictogram indicates a road user, which has to wait / give priority.road users already extracted from the video data using commercial [77], open-source [78], or self-developed [19], [79] video analysis frameworks.After determining the maneuvers (going straight, turning left, turning right) and travel directions (e.g., from virtual gate C to virtual gate A at a 3-way intersection) of all road users, the rule-based determination of conflict constellations following the 3AT scheme (for details, see [19]) is applicable (see Figure 4).For example, when two cars at a 3-way intersection are visible in the video simultaneously, they can have a potential conflict constellation according to 3AT 201 [65], when • the first car, called "agent", entering the intersection at virtual gate C wants to turn to virtual gate A and thus slows-down/stops to turn, and • the second car, called "ego", succeeds car A and goes meanwhile straight.In the following, the road user causing the conflict is called "ego" and the opponent road user is called "agent".After identifying a potential conflict constellation, the question is whether it can also be a traffic conflict, that is, whether it bears the potential risk of a crash.Therefore, every identified conflict constellation, called 3AT, is described by the time course of an SSM to assess its maximum risk over time [80], [81].Therefore, 3ATs involving a longitudinal approach between ego and agent, for example, 3AT 201, are described by the modified-time-to-collision (MTTC), [82] expanding the concept of time-to-collision (TTC) [83] by considering road users' accelerations.The TTC describes "the time until a collision between the vehicles would occur if they continued on their present course at their present rates" [80, p. 155].MTTC was calculated as follows: where v is the relative speed, a is the relative (tangential) acceleration, and s is the relative distance between the ego and agent.The MTTC is equal to TTC when a = 0 and v > 0. When a ̸ = 0 and both results are positive, the MTTC corresponds to the smaller value [82].When a ̸ = 0 and one result is negative and one is positive, the MTTC corresponds to the positive result [82].In contrast, 3ATs, for example, 3AT 302 [65], consisting mostly of perpendicular approaches between ego and agent, are described using postencroachment-time (PET) [84].PET is defined as "the time between the moment that a road user (vehicle) leaves the area of potential collision and the other road user [vehicle] arrives [at the] collision area" [80, p. 155] and is calculated as follows: where t 1 is the moment at which the rear bumper of the first vehicle leaves the area of potential collision and t 2 is the moment the front bumper of the second vehicle enters the area of potential collision.After calculating the corresponding SSMs and their time-course SSM(t), only the conflict constellations with a potential risk of crash are stored by removing those with a minimum SSM min greater than five seconds.Hereby, five seconds corresponds to the length of the GIDAS pre-crash matrix [19], [85], which is a simulation format widely used in accident research.Subsequently, for each of the identified and stored 3ATs, the corresponding metadata, including the incorrect behavior of road users, of the VO can be matched.Finally, the population of traffic conflicts identified from VO should also be assessed for their overall risk of crashes using an approach based on real-time crash prediction modeling [19], [71].
Reference [19] proposed using only 3AT populations, with a positive risk of crashes R C,3AT > 0. The R C,3AT is calculated as follows [19], [71], and [86]: Here, G 3AT is the generalized extreme value (GEV) distribution obtained via maximum likelihood estimation based on the SSM min distribution obtained for every type 16360 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. of 3AT [19].Specifically, G 3AT is described by scale parameter σ , location parameter µ, and shape parameter ξ .After sorting out the 3ATs that do not have a positive crash risk and transforming, if possible, all other variables to the shared variables X S , the actual fusion can then begin.Table 3 summarizes all the necessary "Unify" steps.

C. SYNTHESIZE
The third step, "Synthesize" encompasses all steps to fuse both data sets, the donor data set VO, and the recipient data set PD into the new data set PD * by SM [7].First, the assumption that both data sets to be fused belong to the same population is verified by investigating the shared variables X S with the same name, content, and categories identified previously.The more shared variables exist and the more the distributions of the shared variables in PD and VO coincide (the more common variables X C exist), the more likely it is that a common population is given.Reference [87] recommended using the Hellinger Distance H D ∈ [0, 1] as a similarity measure to compare the distributions of categorical (nominal) shared variables.To compare a shared categorical (nominal) variable x between PD and VO, H D (p x,VO , p x,PD ) is calculated based on the relative frequencies p x of the individual categories j = 1, . . ., J of the shared variable x compared: The more similar the distributions are, the closer H D (p x,VO , p x,PD ) is to zero.Reference [88] refer to the rule of thumb that distributions are similar when H D (p x,VO , p x,PD ) ≤ 0.05, which must not be valid for every data fusion.Therefore, we determine the threshold for similarity empirically by splitting the VO data set n times randomly into two data sets following the same ratio as PD to VO and applying the maximum/median of all H D measured between the synthetically created VO data sets as the similarity threshold.Subsequently, all common variables X C belonging to the same population are investigated for their suitability as matching candidates for X MC .In this data fusion, the specific (missing) variables Z to be fused, are all metric and assumed to be normally distributed; for example, the variables EgoStartSpeed and AgentStartSpeed.By contrast, the shared variables X S are all categorical and nominal and can thus be displayed as dichotomous variables.Thus, we use the point-biserial correlation measure r pb ∈ [−1, 1] to find the categorical matching candidates x in X C,VO , which explain the (metric) specific variables Z to be fused best [89]: where Z 0 / Z 1 are the means of the specific variable z when the common variable x is coded 0 and 1, respectively; N 0 / N 1 are the numbers of observations when x is 0 / 1; and N is the number of all observations, that is, the sum of N 0 and N 1 (= VO size).A perfect positive correlation is given at r pb,xz = +1, no association is given at r pb,xz = 0, and a perfect negative correlation is given at r pb,xz = −1.
Because it is also difficult to define a selection threshold for the selection of matching candidates X MC using pointbiserial correlation, we perform test fusions with the selected matching candidates and thereby identify their best combination/selection.The best combination/selection corresponds to the final matching variables X MV for the subsequent fusion.The best fusion is that which best preserves the distributions of the specific variables Z in VO and the fused data set PD * , among others (see Sections II-C and III-D).With regard to fusion using constrained/unconstrained DHD, we consider every scenario/observation stored in VO as unique.Consequently, all the specific variables Z that are fused always originate from the same scenario identified by one matching combination of MVs.This approach has the advantage that the parameter ranges/distributions required to describe logical scenarios can be derived flexibly, because values that have already been aggregated over multiple scenarios, such as the mean values of EgoStartSpeeds, are not matched during the fusion.Table 4 summarizes all "Synthesize" steps.

D. EVALUATE
The fourth and last step "Evaluate" encompasses all steps to evaluate the data fusion [7], whereby we do not compare the results with real-world data owing to missing access to, for example, the German-In-Depth-Accident-Study [90] providing comparable accident data with a five-second precollision simulation [85].The metrics used to compare the (marginal) distributions f Z of VO and PD * and the common distributions f ZX MV of VO and PD * depend on the type of variables compared (metric vs. categorical).In the case of distributions f Z / f ZX MV formed by categorical variables, we use the Hellinger Distance H D (p VO , p PD * ), as introduced in (5).When comparing distributions f Z formed by metric Z, we calculate the two-sample Smirnov test statistic D [91] for α = 0.05, which is a common approach for comparing two empirical distributions [91].The corresponding critical value of the test statistic D crit,α=0.05, forming the upper threshold for judging similarity and used for large samples, such as n VO and n PD * , is calculated as follows [92]: However, a comparison of the common distributions f ZX MV formed by metric Z and categorical MVs X MV is not possible.Hence, we compare how the correlation between Z and X MV is preserved between VO and PD * , and thus compare the corresponding difference r pb,X MV Z ,VOPD * using (6).
As stated in Section III-C, applying pre-defined thresholds to judge similarity is difficult.Additionally, relying only on statistical tests may be too strict.Hence, we empirically determine thresholds for all comparisons, as introduced, based on the mean of all metrics computed for n random splits of the donor data set VO. Please note that some of the determined thresholds are already necessary in the step "Synthesize" to judge, for example, the common population of the common variables.Table 5 summarizes all "Evaluate" steps.

IV. RESULTS
The results follow the structure of the FUSE4Rep process model.

A. FIND
The ODD of the hypothetical AEB for which the test scenarios were identified is defined as follows: "The AEB shall operate in 2022 at 20 different, but in their geometry, traffic volume and traffic signage comparable, inner-city intersections (see Figure 5) in Dresden, Germany, at good weather conditions (dry road, no rain, no fog, no snow, no strong wind) during daylight and shall support the drivers in car-car turning / crossing / longitudinal conflicts." To address this ODD, we used for demonstration purposes two, according to the FUSE4Rep process model collected [19], data sets of the representative and publicly available ListDB data set collection [16] as VO data: ListDBRepOne [93] and ListDBRepTwo [94] data sets.Both contain each representative three-month VO (June to August 2022) at one intersection in Dresden, Germany.According to [19], June to August are accident-prone months in Dresden and may thus lead to more observed traffic conflicts in the data sets used.The ListDBRe-pOne data set focuses on a three-way intersection, called "Tharandter Straße/Frankenbergstraße" and encompasses 790 min of video and 10,324 car trajectories.The ListDBRepTwo data set focuses on a four-way intersection, 16362 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
called "Kohlenstraße/Dorfhainer Straße" and encompasses 855 min of video and 7,570 car trajectories.Both intersections were recorded during daylight at four different time slots throughout the day, four times a month, and under good weather conditions (no rain, fog, snow or stronger1 wind) [19].In addition to the original video material and trajectories of all road users analyzed by DataFromSky [77], both data sets contain extensive metadata (122 variables) according to the ListDB specification [76], which is already geared towards the closest possible conformity with PD [19].Thus, the metadata include variables of environmental information (e.g., temperature and road surface temperature), location information, and information on incorrect behavior (e.g., failure to observe the right of way) of road users [19], [76].Both intersections are located in the city and have a speed limit of 50 km/h on priority roads.
We then obtained all PD available to us from 01/01/2005 to 12/31/20212 for all 20 intersections (see Figure 5) specified in the ODD (2x VO+18x additional intersections) and filtered them according to the following criteria: • Car-car accidents not involving trailers.
• No influence of alcohol or drugs was observed, which is not observable in VO.
• In the case of the seven intersections connected with "Tharandter Straße": Accidents since 2009 due to construction work in 2007-2008, which removed embedded train tracks on the road.
• Plausible accident information coded by the police.
• Available accident description that allows the determination of 3AT.
• 3AT is either of type "turning", "crossing" or "longitudinal" [65].Finally, we obtained a PD data set of 74 car-car accidents described by 269 variables and an accident description.Overall, the shared superordinate population between the acquired PD and VO data sets is estimated as: • [Factual] All turning, crossing or longitudinal traffic conflicts that potentially lead to accidents between two cars.Conflicts occur during daylight under good weather conditions (no rain, fog, or snow).

B. UNIFY
Overall, ten different types of 3ATs describing the conflict situations of the 74 car-car accidents are represented in the PD (see Figure 6): three turning conflicts (3ATs 201, 211, 231), four crossing conflicts (3ATs 302, 303, 321, 322), and three longitudinal conflicts (3ATs 681, 601, 621).According to the ten 3ATs represented in PD, we identified the same conflict constellations in the VO by analyzing the provided  trajectory data.However, we collapsed 3ATs 601 and 621 into one category, called "6021", resulting in the identification of nine different 3ATs.The reason was, that the 3ATs 601 and 621 are difficult to distinguish at the observed intersections.
In fact, they differ only in that 3AT 621 specifies a possible reason for the leading car to slow down: it must wait at the intersection entry.Figure 7 illustrates the distribution of the nine different 3ATs identified in the 74 accidents recorded in PD and in the 1,648 conflict constellations, that is, the scenarios identified in VO.Accordingly, the three most frequent 3ATs are in both data sets: 3ATs 6021, 302, and 322, where 3AT 6021 leads by a large margin to the second most common 3AT in both data sets.Interestingly, 3AT 681 was recorded twice in PD but was never observed in VO.
Table 6 presents 3AT-dependent crash risk R C,3AT estimates based on the calculated minimum SSMs (SSM min ).Accordingly, every 3AT GEV distribution has a positive crash risk, meaning that the identified conflict constellations bear the inherent risk of a crash.But, the risk of crashes for 3AT 231, R C,231 = 1e −8 , is relatively small; therefore, inclusion seems questionable.However, when inspecting the empirical and modeled GEV distributions for 3AT 231 (see Figure 8), it is evident that the modeled GEV underestimates  the risk of a crash R C,231 compared to the real observation (see the left area under the corresponding curve, next to the yellow vertical line).Therefore, we included the 3AT 231 population for subsequent data fusion.Finally, when comparing the standard error estimates (see Table 6), it is remarkable that the modeling quality correlates with the available number of extreme values, and the highest error estimates are observed for the GEV of 3AT 321, which is based on only 29 extreme values.
In conclusion, the recipient data set PD encompasses 74 scenarios/observations and the donor data set VO encompasses 1,648 scenarios/observations.Moreover, PD and VO share 38 categorical variables X S (Table 7), which are further described in the ListDB codebook [76].

C. SYNTHESIZE
Of the 38 shared categorical variables X S between PD and VO, 27 did not show any variance between PD and VO, that is H D = 0 (Figure 9).Subsequently, we can assume a common population between PD and VO regarding these 27 variables, whereby 16 of the 27 variables specify the FIGURE Hellinger distance for all 38 variables shared between PD and VO.The variables are explained in [76].The yellow dot indicates the Hellinger distance for the variable ''AccidentType'' when dropping the 3AT 681 in PD.The dashed lines indicate empirically determined thresholds for similarity (yellow: maximum/blue: median) by measuring the Hellinger distance for 100 random splits of VO according to the ratio of PD/VO.locations of the VO/PD (e.g., Bridge, BusLane, and Bypass).Of the remaining 11 variables showing a H D > 0, six were below the empirically determined similarity threshold of H D = 0.148 (maximum H D observed in 100 random splits of VO).However, AccidentType, the variable specifying 3AT, did not seem to share a common population with H D = 0.249, exceeding the (maximum) similarity threshold by 68%.When calculating H D only for the 3ATs occurring in PD and VO, that is, when neglecting 3AT 681 only occurring in PD, H D for AccidentType decreased to 0.222, but still exceeded the (maximum) threshold by 50%.Overall, VO and PD appeared to share a common population with 33 common variables: X C .
To select the matching candidates X MC , we investigated the explanatory power of the common variables X C for selected specific variables Z (see Table 7), describing mainly the dynamic behavior of the ego at the start of the scenario: EgoSpeedStart, EgoAccelerationStart, EgoAccelerationTanStart, and SSM min .All variables (except SSM min ) were calculated using the median of the first ten data points of the corresponding trajectory.Regarding the common variables X C , 27 common variables showing zero variance between PD and VO are negligible; they cannot be used for matching.
Instead, we also included the shared variables Accident-Type and CrashType in the selection process because these variables are assumed to have a high correlation, and thus explanatory power, with Z variables.Figure 10 supports this assumption: The variable AccidentType had the highest, significant positive correlation with all selected specific variables: SSM min (0.44), EgoSpeedStart (0.31), EgoAccelerationStart (0.16) and EgoAccelerationTanStart (0.25).In contrast, the variables HourMinute and Weekday had the weakest correlations, resulting in a maximum correlation of 0.08 for e.g., HourMinute ∼ EgoSpeedStart.Moderate correlations were observed for BusStop (max: 0.16), CrashType (max: 0.23), CycleLane (max: 0.16) and Geometry (max: 0.16).Based on these results, we selected the following five variables as matching candidates X MC for the test fusions, despite not all five variables belonging to the common population between PD and VO: AccidentType, CrashType, Geometry, CycleLane, and BusStop.The variable CrashType specifies the (estimated) type of collision and distinguishes in the present data set between collisions with leading cars, oncoming cars, and turning/crossing cars.The variable CrashType was estimated for VO based on the 3AT during the step "Unify".The variable Geometry differentiates between three-and four-way intersections.The variables CycleLane and BusStop indicate whether a cycle lane or bus station are located at an intersection, respectively.
In the 32 test fusions, we tried all 16 combinations of the five selected matching candidates X MC , dependent on constrained/unconstrained DHD.However, the variable AccidentType always had to be included because it is the strongest link between PD and VO and had the highest correlation with the specific variables Z.To identify the best test fusion, we compared the distributions f Z sel of the seven most important specific variables, Z sel , between VO and PD*, to save time and computational resources.The most important specific variables Z sel are those that specify the dynamic behavior of ego and agent at the start of the scenario (except SSM min ) and are all of metric nature: Ego/AgentSpeedStart, Ego/AgentAccelerationStart, Ego/AgentAccelerationTanStart, SSM min .
Figure 11 illustrates the 32 test fusion results, where all test fusions were below the upper threshold D Z ,upper = 0.23 (yellow line).26 of the 32 test fusions produced similar distributions, according to the Smirnov test (D crit,α=0.05< 0.162).Seven test fusions were also below the lower threshold D Z ,lower = 0.09, whereby the best results were delivered by an unconstrained DHD, relying on the matching candidates AccidentType, Geometry, and BusStop (median: 0.068).Interestingly, in 31 of the 32 test fusions, the unconstrained DHD outperformed the constrained DHD, except for the fusion that used the matching candidates AccidentType and BusStop.Furthermore, more matching candidates did not necessarily lead to better fusion results, as demonstrated by the fusion using all matching candidates; this phenomenon coincides with the experience in [88].Table 9 (see Appendix) provides an overview of the 74 scenarios with an excerpt of 13 selected variables.

D. EVALUATE
In the following, we evaluate the best fusion PD* (unconstrained; MVs = AccidentType, Geometry, BusStop) by comparing the (marginal) distributions f Z of all specific variables Z (see Table 7) and the common distributions f ZX MV of specific variables Z with the MVs in VO and PD*.
Figure 12 shows that the median (0.07) of all compared distributions of metric and specific variables Z metric is below all the determined thresholds and thus can be considered similar.Furthermore, all the compared metric distributions of Z metric were below the critical value of the Smirnov statistic (D crit,α = 0.162), with the variable AgentAverageSpeed (0.118) closest to the threshold.16366 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Six of the seven most important specific variables, Z sel , were below or equal to the lower threshold D Z ,lower = 0.09: EgoSpeedStart (0.07), Ego/AgentAccelerationStart (0.059/0.065),Ego/AgentAccelerationTanStart (0.093/ 0.068), and SSM min (0.063).Only AgentSpeedStart was slightly above the lower threshold (0.108).
In contrast, the median of the compared categorical distributions (0.04) for Z cat was slightly higher than the lower empirical threshold (H D,Z ,lower = 0.03).However, all categorical specific variables, Z cat , were below the upper empirical threshold (H D,Z ,upper = 0.15), except for EgoDirection (0.167).Regarding the preservation of the correlation between the metric specific variables, Z metric , and the MVs, X MV , the median of the compared differences in the correlations in VO and PD* exactly coincided with the identified lower threshold ( rpb,MVZ ,lower = 0.05), and all were below the upper threshold rpb,MVZ ,upper = 0.35.
Finally, the common distributions of the categorical specific variables, Z cat , and categorical MVs exceeded the lower threshold, H D,MVZ ,lower = 0.09, by 144%, resulting in a median value of 0.22.Only 21 out of 42 compared common distributions were below the median of 0.22, and thus, can be considered similar.In addition, the common distributions of AccidentType + Wind (0.356) and BusStop + RoadUserSecondMost (0.367) exceeded the upper threshold H D,MVZ ,upper = 0.33.
Figure 13 shows the cumulative density function plots for Ego/AgentSpeedStart and Ego/AgentAccelerationTanStart for the VO and PD*.As revealed by the Smirnov test statistic, the AgentSpeedStart curves did not coincide as well as the others, and the fused start speeds in PD* (median: 6.7 m/s) were, on average, 10% higher than the original donor speeds in VO (median: 6.1 m/s).In conclusion, we consider data fusion to be valid for the subsequent derivation of logical scenarios because of the similar distributions of VO and PD*.Moreover, the unconstrained DHD used 73 different VO scenarios to fuse them with the 74 scenarios contained in PD.The identical scenario fused twice by the unconstrained DHD was a scenario based on the 3AT "322".

E. DERIVATION OF LOGICAL SCENARIOS
The most likely traffic conflict leading to an accident in the considered ODD, which the hypothetical AEB could face, was 3AT 6021 (44.6%, see Table 8).Thereby the observed EgoStartSpeeds were between 2.43 m/s and 17.32 m/s, with a mean of 10.71 m/s.Interestingly, in the VO data set for scenario 6021, the minimum observed EgoStartSpeed was 0.91 m/s, and thus 62.5% lower than in PD*.The variable EgoAccelerationStart demonstrates a comparable occurrence, as its minimum initial acceleration rises from 0.05 m/s 2 in VO to 0.16 m/s 2 in PD*.The observed AgentStartSpeeds were between 3.23 m/s and 16.23 m/s, with a mean of 10.64 m/s.The same phenomenon as for the ego occurs for the agent, showing an observed minimum AgentStartSpeed, which increases from 0.05 m/s in VO to 0.16 m/s in PD*, and an observed minimum AgentAccelerationStart, which increases from 0.01 m/s 2 to 0.11 m/s 2 in PD*.The accident severity was between no one injured and the agent being slightly injured, whereby the maximum property damage observed was 8.500e.The driver age of the egos was between 19 and 71 years, whereas that of the agent was between 19 and 79 years.
Table 8 also introduces exemplary logical scenarios for 3ATs 231 and 321, which display the minimum, mean, and maximum values of the considered PD* variable.As expected, the EgoSpeeds at start differ between the 3ATs: while the 3AT 6021 has a minimum speed at the start of 2.43 m/s (mean: 10.71 m/s), 3AT 231 has 9.72 m/s (mean: 10.01 m/s) and 3AT 321 12.24 m/s (mean: 14.72 m/s) due to probably wrongly assuming the right of way.For testing in a stochastic simulation, it is necessary to transfer real intersection geometries into an OpenDRIVE file [96].Table 8 provides the location IDs (LocIDs) for OpenDRIVE file creation, that can be used to create the corresponding files from aerial images.

V. DISCUSSION
In the following section, we answer the research questions, relate the results to the literature, and discuss the limitations of the study.Finally, we provide an outlook for researchers and practitioners.

A. RESEARCH QUESTIONS
The overall RQ asked for a method to create a representative test scenario catalog suitable for testing in stochastic traffic simulations.We answered the overall RQ by fusing 74 accidents obtained from PD and 1,648 scenarios identified in the VO data following the FUSE4Rep process model.The generated representative test scenario database PD*, containing 74 scenarios described by over 350 variables, can be considered representative of the defined population derived from AEB's ODD.We want to emphasize that it would not have been possible to derive concrete scenarios from PD by analytically reconstructing accidents because of missing information.Even if the PD contains extensive information for reconstruction, the analytical reconstruction of any accident is very complex and expensive.RQ1 asked for suitable MVs to fuse PD and VO data.Among the 32 test fusions conducted, AccidentType, Geometry, and BusStop were identified as the best combinations of MVs.Unfortunately, AccidentType did not belong to the common population of PD and VO, and attention should be paid to this in future fusions.An improvement could possibly already be achieved by a VO that overlaps more closely in time with the PD and/or a VO that is carried out over a longer period of time instead of just three months.
RQ2 asked which type of DHD performed better: constrained or unconstrained.In this study, unconstrained DHD performed the best and outperformed constrained DHD in 31 of the 32 test fusions.
RQ3 asked how good the final fusion results were overall: Almost all conducted comparisons of distribution similarity between VO and PD* showed that these are similar, which is the most important quality criterion for a valid data fusion.In particular, the distributions of the most important metric variables -Ego/AgentSpeedStart, Ego/AgentAccelerationStart, Ego/ AgentAccelerationTanStart, and SSM min -show a high degree of similarity.
Finally, RQ4 asked whether it was possible to derive logical scenarios applicable for testing in a stochastic traffic simulation.In fact, the logical scenarios derived from PD* do contain all necessary variables [3], [4], [7]: Ego/AgentEntryStart specifying the start position as the distance to the intersection entry, AccidentType and Ego/AgentManeuver describing the maneuvers of ego and agent, Ego/AgentSpeedStart describing the initial speed at the start, Ego/AgentAge as the corresponding driver's age, and Ego/AgentDrivingLicenseYear as the year in which the driver's license was acquired.The latter can be an indicator of the driving experience.It has also been shown that logical scenarios derived from PD* instead of VO have a narrower range of values, e.g. for the initial start speeds Ego/AgentStartSpeed, which allows the speed ranges to be narrowed down to higher start speeds in the stochastic simulation.
In addition, the fused data set PD* allows the derivation of the parameters in the way they are needed to generate logical scenarios for stochastic traffic simulations [3], [7]: be it min/max specifications for metric variables or, for example, normal distributions.The quality of the data fusion is suitable for the derivation of logical scenarios, because of the good distribution reproduction of the specific variables Z in PD*.Moreover, it is also possible to derive, for example, the Ego/AgentSpeed describing variables dependent on their current intersection phase (approaching, deceleration, crossing, and exit) [97].Next, the three identified logical scenarios can be used for AEB evaluation in a stochastic simulation, as in [3].The logical scenarios and the stochastic component of the simulation can then be used to test the AEB in normal driving, critical, and accident scenarios [3].However, we emphasize that the concrete scenarios stored in PD* do not necessarily have to be plausible, because the DHD minimzes both, the single and overall distances of the matching variables [26].

B. COMPARISON TO EXISTING RESEARCH
When comparing the results to the PD-VO fusion conducted in [26], it is remarkable that the MVs used are no longer time-related (e.g., Weekday, Hour, Minute), but conflict-(AccidentType) and location-related (Geometry, BusStop).Moreover, the quality of the data fusion itself is significantly better than that of [26], for example, with regard to the Z distributions in PD* and VO (a direct comparison of figures is not reasonable, because in [26] all specific variables Z were categorical).The reasons for this may be found in the use of the unconstrained DHD instead of the constrained one, in the more meaningful MVs, and in the longer VO (eight days vs. three months).Regarding the FUSE4Rep process model proposed in [7], no major drawbacks were identified in the application.Only matching the manually recorded incorrect behavior of road users (provided in VO [93], [94]) to the identified 3ATs (see [19]) did not work properly.In the future, location-based information (e.g., the zone of incorrect behavior) should be included in the matching process.Furthermore, while convenient, the 3AT scheme for describing a conflict situation, finds only pre-defined conflict patterns.This situation is exacerbated by the fact that the SSMs used (MTTC and PET) are unsuitable for describing any form of traffic conflict [80].

C. LIMITATIONS
Limitations arise owing to the conditional independence assumption (CIA) necessary for valid data fusion, according to [88], [98], and [99].Thereby, CIA cannot be measured and must be assumed by an expert performing SM [88].CIA assumes that the variables specific to each data set Y and Z are independent, given the common variables X.In other words, the joint distribution of Y and Z is unknown, whereas it is also difficult to estimate.However, the more common variables X exist, the more likely CIA can be assumed [100].We assume that the CIA is given by 38 shared and 33 common variables.
We also emphasize that only the fourth validity level, the similarity of f Z and f ZX MV between PD* and VO, according to [20], can be verified in real-world fusions.Therefore, subsequent statistical analyses of PD* must be performed carefully because of potential limitations.In addition, the overlap between the periods of PD (16 years) and VO (3 months) in the example shown is small; therefore, care should be taken to ensure a better match for future fusions.
Another limitation arises from the empirically determined thresholds, which were obtained by randomly splitting VO 100 times into two data sets.Further studies should investigate the optimal number of random splits to properly determine thresholds.
Finally, we want to point out that the trajectories contained in the VO data sets [93], [94] describe only the estimated center point of every road user, combined with an estimated standard geometry (length/width).Therefore, the calculated SSMs can overestimate or underestimate the corresponding conflicts owing to varying real-world geometries.

D. FOR RESEARCHERS
We recommend further research efforts to determine the incorrect behavior of road users in VO to gain additional MVs.Moreover, the 3AT determination, especially the SSM calculation in VO, can be improved using video analysis techniques, that allow the determination of exact road user geometries.In addition, scenarios/traffic conflicts can be identified in PD and VO by relying on rule-based techniques as well as supervised/unsupervised techniques [10].
Regarding SM, we recommend investigating imprecise imputation techniques, which are independent of CIA [101], and testing further validation techniques, such as relying on bias and variance estimates [102].Finally, the construction of a fused data set using a symmetric SM [101] can lead to a larger database of test scenarios.

E. FOR PRACTITIONERS
We recommend that practitioners compare the demonstrated fusion results with real-world data, such as with comparable reconstructed accidents stored in the GIDAS database [90].In addition, we recommend extending the shared variables between VO and PD by improving the VO data collection process.Also, the collection of metric variables instead of categorical variables can be expanded (e.g., precipitation amount instead of Rain: yes or no).Moreover, we recommend performing longer VO observation periods and conducting these continuously alongside PD collections so that the superordinate shared population between the VO and PD matches better.Furthermore, we recommend the additional use of weather-independent sensors.The drones used in [93], and [94] can fly only under good weather conditions, excluding bad weather-related traffic conflicts.

VI. CONCLUSION
Fusing concrete scenarios identified from different data sources helps close the gaps in data availability and differing information when creating representative test scenario databases and catalogs.As part of the FUSE4Rep process model, statistical matching can generate a new representative accident test scenario database, PD*, by asymmetrically fusing scenarios identified in police accident data (PD) with those identified in video-based traffic observation (VO) data.The necessary matching variables are AccidentType (describing the underlying traffic conflicts), BusStop (indicating the presence of bus stops), and Geometry (describing, for example, the number of nodes of intersections).The best matching algorithm is the unconstrained distance-hotdeck method.Consequently, the fused scenario database PD* includes information on the road users involved in an accident (age, year of driving license acquisition), the conflict situation, and the corresponding starting speeds and accelerations.Thus, the most frequent conflicts leading to accidents and how they should be parameterized for ADS testing in stochastic simulations can be determined and transferred to a test scenario catalog.
The FUSE4Rep process model was proposed and applied in part to determine scenarios from VO data.As our results demonstrate for the first time, it is also possible to use the FUSE4Rep process model to fuse PD and VO scenario data sets into PD* and subsequently derive a test scenario catalog containing logical scenarios.These logical scenarios can then be used to assess, for example, autonomous emergency braking systems (AEBs), in stochastic traffic simulations.Future research should focus on improving the scenario identification process in the PD and VO data sets and increasing the number of shared variables between the scenario data sets to be fused.Moreover, the process of statistical matching can be improved by investigating imprecise imputation techniques, performing symmetric statistical matching, and expanding validation techniques.
Overall, the collection of VO data (samples) should be continuous with the PD collection and a fixed VO collection system should be established -if this happens, we expect that our method will help to develop test scenario databases containing comprehensively described test scenarios that are representative of clearly defined operational design domains.In this way, we contribute to safe ADSs and, thus, to safer road traffic in the future.

APPENDIX
See Tables 8 and 9.
16370 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 4 .
FIGURE 4. Process of 3AT determination in VO according to [19] (left).Possible maneuvers at a 3-/4-way intersection without turn-around (right).Virtual gates are indicated by the letters A-D.

FIGURE 6 .
FIGURE 6.Ten different 3ATs represent the conflict situations of 74 car-car accidents in the PD.Pictograms provided by [65]."W" in pictogram indicates a road user, which has to wait / give priority.

FIGURE 8 .
FIGURE 8. Comparison of empirical and modeled extreme value distributions of 3AT 231 with MTTC [s] as SSM.The vertical yellow line marks zero.

FIGURE 11 .
FIGURE 11.Boxplots illustrating the calculated Smirnov test statistics D of the selected specific variables Z between VO and PD*.The results depend on the combination of matching candidates used for the test fusions and the type of DHD (constrained/unconstrained).The blue and yellow dashed lines indicate empirically determined thresholds for similarity (yellow: maximum/blue: median) by measuring the Hellinger distance for 100 random splits of VO according to the ratio of PD/VO.The gray dashed line indicates the critical value of the test statistic (α = 0.05; 0.162) under which all distributions can be considered similar.

FIGURE 12 .
FIGURE 12.Boxplots illustrating the 4 th validity level.The blue and yellow labels indicate empirically determined thresholds for similarity (yellow: maximum/blue: median).The gray label indicates the critical value of the test statistic (α = 0.05; 0.162) under which all distributions can be considered similar.
occurring in the absence of rain, fog, and snow [Factual] at German intersections [Spatial] in 2023 [Temporal]".

TABLE 6 .
Results of modeled extreme value distributions.Minimum and maximum standard error estimates are bold in the columns.Scale σ , location µ, shape ξ .

TABLE 7 .
Shared X S and specific variables Z (alphabetically sorted).

TABLE 8 .
Logic scenarios derived from the fused database containing n = 74 scenarios following the 6LM when applicable (extract).

TABLE 9 .
Newly created data set PD*, described by 13 of 350+ selected variables."9999" is coded, if unknown.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.