Object-Level Data-Driven Sensor Simulation for Automotive Environment Perception

The gradually evolving automated driving and ADAS functions require more enhanced environment perception. The key to reliable environmental perception is large amounts of data that are hard to collect. Several simulators provide realistic, raw sensor data based on physical sensor models. However, besides their high price, they also require very high computation capacity. Furthermore, most sensor suppliers provide high-level data, such as object detections, that is complicated to reproduce from simulated raw sensor data. This paper proposes a method that directly simulates the detections or object tracks provided by smart sensors. The model involves several uncertainties of the sensors, such as missed-, false detections, and measurement noise. In contrast to the conventional sensor models, this method tackles with state-dependent clutter model and considers the field of view in the detections model. The parameters of the proposed model are identified for an automotive smart radar and camera based on pre-evaluated real-world measurements. The resulting model provides synthetic object-level data with higher fidelity than the conventional probabilistic models, differing less than 2% from the precision and recall metrics of the actual sensors.


I. INTRODUCTION
T HE ADAS (Advanced Driver Assistance System) and AD (Automated Driving) functions promise a number of benefits in terms of sustainability [1], traffic-flow optimization [2], and economics [3].Besides these advantages, the greatest motivation of ADAS functions is to increase traffic safety [4].Therefore, the EU's GSR (General Safety Regulations) forces the OEMs (Original Equipment Manufacturer) to introduce new ADAS features into their vehicles.None of the automotive sensors available on the market can satisfy the requirements of these advanced functions in terms of environment perception [5].Therefore, the perception of intelligent vehicles equipped with ADAS functions is performed by the fusion of several sensors with different advantages [6], [7], [8].Several tests ensure the functional safety of these systems on different levels, such as unit, module, and integration tests.In the early stages of development, testing and verification on real-world measurement are not cost-efficient, and data acquisition is time-consuming; hence, using simulation tools is beneficial [9].Furthermore, in the case of HAD (Highly Automated Driving), testing of corner cases is complicated in a real environment due to safety and reproducibility problems.Simulation of traffic flow and vehicle dynamics are already widespread techniques in the testing of high-level decision-making and control algorithms [10], [11], [12].Synthetic data augmentation is also commonly used to develop and test environment perception functionalities such as image processing, object detection, and sensor data fusion.In this article, we propose a generic model of various smart sensors with different outputs, such as detections and tracked objects.The proposed model can generate synthetic sensor data, supporting the development and testing of multi-object tracking and sensor fusion algorithms on different abstraction levels such as detection or track level [13], [14].
However, physical sensor model-based simulation tools generally have significant computational and hardware resource requirements [24], [25], [52], [53].The probabilistic sensor models can provide a reasonable trade-off between computational complexity and fidelity level [54].Another question lies in the abstraction level of the data [55].Most automotive sensor suppliers develop smart sensors that provide processed, object-level data, such as detections or tracked objects, instead of raw measurements.Therefore, physical sensor models should incorporate the processing algorithms of these sensors (e.g., detection and tracking) to reproduce their data.However, these algorithms of the automotive sensors are not public, and there is a tremendous number of approaches for the different sensors, like radars [56], [57], cameras [58], [59], [60], and LiDARs [61].Therefore, it is challenging to reproduce the object-level output from the raw data of a specific sensor with high fidelity, and this approach is not modular for different types of sensors.The possible errors of the different sensors for probabilistic models are collected in [62].Some works provide a generic model for sensor simulation, but they mainly tackle the measurement noise [63], [64] and the detectability considering the sensor FoV [65].Object-level simulations with more detailed detection models are proposed in [66], [67] considering the physical aspects of radar reflections.The radar model in [66] was extended with further physical aspects [68] and clutter detection simulation [69].A generic object-based data-driven model is proposed in [70]; however, the detection model follows a Bernoulli distribution unsuitable for tracked objects.The clutter model uses uniform distribution, neglecting the static reflecting areas (e.g., guardrails, bridges), resulting in a non-uniform spatial distribution of false detections.

B. Contributions of the Article
In this paper, we propose a high-level generic sensor model that simulates the object-level information, such as detections or tracked objects, of automotive smart sensors with low computational effort.The proposed sensor model consists of three sub-models, the detection and tracking, clutter, and measurement model, tackling the missed and clutter detections and measurement noise.The sensor model parameters are identified using real-world measurements of a simulated camera and radar, resulting in a data-driven simulator with high fidelity.This simulation can support developing and validating object tracking and sensor fusion algorithms.The detection and tracking model imposes a state-dependent detectability of the objects considering single-shot detections and tracked objects.Compared to the

II. SENSOR MODEL
The simulation of the automotive environment sensors has to face two questions: what the sensor detects and how reliably.The proposed sensor model is derived from the multi-object tracking algorithms applied in a cluttered environment as in [71], [72] considering the detectability, clutter detections, and the measurement uncertainties of the sensors with the three modules shown in Fig. 1.This concept provides an end-to-end simulation of the objects provided by smart sensors, modeling their built-in perception algorithm.The detection and tracking, and clutter model answer the first question: what does the sensor detect?The sensor detections may be generated by real targets, but some of them are usually false detections.Therefore, the simulation workflow illustrated in Fig. 1 consists of two main processes generating the object-and clutter, i.e., true positive and false positive detections.The detection and tracking model determines which scenario objects, such as passenger cars, trucks, and buses, are assumed to be detected by the sensor, as shown in Fig. 1.The clutter model describes the appearance of false detections generated by static environmental objects, such as guardrails, bridges, traffic signs, or other unknown reasons.Since usually automotive smart sensors include a built-in tracking algorithm, the false detections do not disappear immediately, but they survive a couple of cycles.Therefore, the detection and tracking model is utilized in the simulation of clutter detections to update previous false detections, determining which survive.The overall clutter detections are formed by the union of the appearing and tracked detections.The measurement model simulates the measurements generated by the targets, i.e., the surrounding and static objects corresponding to the object-and clutter detections, respectively.It considers the observation noise of the simulated sensors.The modules of the proposed model, the detection and tracking, clutter, and measurement models, are detailed in the following subsections.

A. Detection and Tracking Model
The detection model intends to describe how likely the sensor is to detect a present object, thus simulating false negative detections.In multi-object tracking algorithms, the detection probability p D (x) corresponds to the event that an object with state x is detectable; a constant value usually gives that as in [73], [74].There are works tackling object tracking problems in varying detectability environments as in [75], [76], [77], [78].In this article, we propose a state-dependent detection probability model for p D (x), considering the sensor FoV (field of view) and the signal processing of smart sensors.A general detection model is demonstrated in Fig. 2, where the color intensity corresponds to the value of detection probability, and the solid line represents the manufacturer datasheet FoV.It is common to model the FoV of a sensor by circle sectors or annuli defined by a distance and angle range.Therefore, the detection probability over the surveillance area is approximated by the following bivariate function of the d distance and ϕ azimuth angle of the object with state x: where p D,max denotes the constant corresponding to the maximum detection probability.The f d (d) distance and f ϕ (ϕ) angledependent functions in (1) are defined as where c d , c ϕ , b d , b ϕ are the coefficients and breakpoints of distance and angle-dependent linear functions, respectively, and ϕ 0 denotes the orientation angle of the sensor.Many automotive radars have multiple FoVs (e.g., a near-range and a far-range).
In this case, the approximated detection probability function pD (d, ϕ) is constructed separately.The simulation of the object detections provided by a sensor requires the set of surrounding objects X of the simulated scenario.Furthermore, most sensors have an interface limitation on the number of provided detections due to the bandwidth; therefore, the scenario objects should be sorted according to their relevancy (e.g., distance), assuming that the sensor provides the detections in this order.Then, the set of single-shot detections Z s o at the k-th timestamp is simulated, assuming that the perception follows a Bernoulli distribution with pD (d, ϕ) parameter considering the n Z,max maximum number of detections provided by the sensor.
However, several smart sensors are equipped with an object tracking algorithm resolving some missed detections of a singleshot detector.Therefore, the Bernoulli distribution is not suitable for modeling the detectability of tracked objects.In this case, the p init (x i k ) initiation and p del (x i k ) deletion probabilities of object x i k are defined as where p rc (x i k ) denotes the recall of object x i k at the k-th timestamp, and it is computed as the proportion of the N d (x i k ) and N p (x i k ) number of cycles in what the object was detected and present.The p t (x i k ) tracking probability considers the detection probability of the object during its life cycle as Then, the set of tracked object detections Z t o at the k-th timestamp is simulated according to Algorithm 1, where γ(x i k−1 ) indicates whether the object x i k−1 was detected at timestamp k − 1, and p del,th is the deletion threshold probability establishing a more stable tracking model.If an object x i k is assumed to be perceived at time k, the sensor measurement considering the observation noise is simulated according to the g k (x i k ) measurement model detailed in Section III-C.

B. Clutter Model
The clutter model intends to produce the false positive detections of the simulated sensors as the physical sensors would provide them.In many multi-object tracking and sensor fusion Algorithm 1: Constructing tracked object detections.
algorithms, the clutter model is given by a Poisson Point Process (PPP) as in [79], [80].The PPP model assumes uniformly distributed clutter detections in a given measurement volume with its cardinality distributed according to a Poisson distribution.Due to its simplicity, it is also commonly used in sensor simulations, as in [69], [70].However, most of the sensors have non-uniform clutter density, and the cardinality of the false detections does not follow a Poisson distribution.
In this article, we provide a non-uniform clutter model that enables a realistic simulation of the sensors' clutter detections.The first step of such a clutter model is to explore the reasons for false detections.For instance, radars usually detect static objects with significant, reflecting areas such as guardrails, poles, highway bridges, etc., resulting in a non-uniform clutter density.The ADAS and AD functions simulation is usually performed based on standard static environment and scenario descriptors such as OpenDRIVE and OpenSCENARIO.The descriptor of the environment, including the road network, lanes, and stationary objects (e.g., traffic lights, traffic signs, etc.), enables one to identify the static and potentially false objects.Smart cameras usually provide fewer false detections, but it is very challenging to identify their exact spatial distribution since it depends highly on the image processing algorithm.Therefore, the proposed clutter model considers the false detections generated by static environment objects and other unknown reasons (e.g., random sensor noise, object detection errors).The workflow of simulating the clutter detections is illustrated in Fig. 3.The set of clutter detections, C k , at time k consists of the C G Gaussian and the C U uniform detections generated by the static environmental objects and other random events, respectively.The clutter model considers that the false detections provided by smart sensors with a built-in tracking algorithm usually do not appear for only one frame.Therefore, in the proposed method, the clutter detections from the previous step, k − 1, are first updated using the detection and tracking model.Although the total cardinality of the tracked clutters does not follow a Poisson distribution as assumed by conventional PPP models, it gives a reasonable estimate of the number of new false detections.Therefore, the new clutters are generated by Poisson distributions with unique parameters corresponding to the different kinds of assumed origins (e.g., guardrails, lamp poles, bridges) of the detections.
In the following, the construction of the C G Gaussian and C U uniform clutter groups is detailed.The two groups are generated separately, starting with the Gaussian group.The first step of the static clutter detection simulation is identifying the Gaussian components corresponding to the potentially false static objects.The Gaussian components can be divided into two types: A cluster C i S S that corresponds to the l i S S class of static, potentially false object (i.e., l 1 S = guardrail, } series of Gaussian components as in Fig. 3, where i c denotes the index of the components within the i S series.The components are following each other with d res distance resolution along the road path within the sensor's view range.The ĉi c S state of the i c -th component is determined by the location of the corresponding objects, i.e., guardrails and lamp poles, stored in the map data of the simulated environment.This information can usually be extracted from many simulation tools in standard format (e.g., OpenDRIVE).The covariance, P i c S , is set so that the eigenvector corresponding to the longitudinal position of the component is parallel to the road path.The series of the guardrail's and lamp poles' Gaussian components are demonstrated in Fig. 4. The w i c S weights of the Gaussian components are computed based on γ(c i c S ), that indicates whether the c i c S component is already detected at the current timestamp and p dist (ĉ i c S , l i c S ) distance probability density as The number of new false detections is computed for each , (8) where c λ (l i S S ) and n max (l i S S ) denote the constant Poisson rate factor and the maximum number of components constructing static false object with label l i S S .The one-sample Gaussian components, shown in Fig. 3 O and are computed as where the Poisson rate λl i O O of the static object class with label , highway bridges) is given similar to (8) by neglecting the number of components of the cluster as The f w O function that computes the Poisson rate of each component is defined by the following two rules: imposing that the λl of the previous cycle, k − 1, are updated by the measurement model g k detailed in Section II-C using the Gaussian components c k ∈ {G s , G s } corresponding to the static object that originally generated the detection.The detection survives if its survive probability is less than the uniformly distributed r ∼ U(0, 1) random number.The survive probability p s (c i G G,k−1 ) is given according to the detection and tracking model by the pD (ĉ i G G,k−1 ) detection probability if a single-shot detector is simulated and by when the sensor provides tracked objects.
In the second step, the appearance of the new Gaussian false detections are simulated.9), respectively.The c b G state of a new false detection corresponding to a series of Gaussian components is performed by a random vector distributed according to a weighted Gaussian mixture in line 24, where the weight is computed according to (7).The one-sample components initiate the state of a newborn false detection defined by the corresponding Gaussian distribution.Furthermore, the maximum number of detections n Z,max of the sensor is considered.
The uniform clutter group C U,k representing the false detections not generated by static objects but for other unknown Algorithm 2: Constructing Gaussian False Detections.

1: given G
series Gaussian components, and Gaussian false objects, and n Z,k number of object detections at time k − 1 and k 2: if single-shot detector then 6: end if 15: end for Generate new Gaussian false detections: 16: end for 33: end for 34: reasons is constructed similarly.In the first step, the survival of the previous uniform clutter detections are simulated based on the detection or tracking model as in lines 3-15 of Algorithm 2, but their c k states are updated using a CA (Constant Acceleration) model prediction.In the second step, the new uniform clutter detection are initiated at time k according to Algorithm 3. The n U,b number of new, uniformly distributed false objects is drawn from a Poisson distribution with rate λU that is computed similarly to (10).However, it is assumed to be proportional to the t k − t k−1 elapsed time instead of the traveled Algorithm 3: Constructing Uniform False Detections.
uniform false objects, n Z,k number of detections at time k − 1 and k, respectively 2: Update C U,k−1 to C U,k|k−1 , as in Algorithm 2 Generate new uniformly distributed false detections: 3: distance.The states of the new uniform clutter detections follow an intermittently uniform distribution.This means that the V F OV sensor FoV is divided to V i ∈ V F OV subspaces with equivalent distance intervals.The d distance of the detection is defined by the p dist,U distance probability density, and the state of the newborn clutter detection, c U,b , is assumed to follow a uniform distribution within the subspace V that corresponds to the d distance.

C. Measurement Model
The measurement model intends to describe the observations and the measurement uncertainties of the sensors.The state space of the scenario-and static objects is defined by where d x , d y , v x , v y , a x , and a y denote the longitudinal and lateral position, velocity, and acceleration, respectively.Note that the pD (d, ϕ) state-dependent detection probability is given in polar coordinate system; therefore, the d distance and ϕ azimuth are created from the d x and d y Cartesian coordinates.
It should be mentioned that the state space of the objects can be extended with other attributes (e.g., dimensions, orientation) if one of the sensors can provide information on them.
A single-shot measurement z k generated by an object with state x k is simulated by adding η k measurement noise to the real object state x k transformed to the measurement space with the observation model h k (x k ) as where η k is assumed to follow a N (•, 0, R k ) Gaussian distribution with 0 mean value and R k covariance.This assumption is a commonly used measurement model in different recursive filtering algorithms such as Kalman Filter (KF) [81] or Extended Kalman Filter (EKF) [82].However, in the case of tracked detections, the x k state is recursively estimated by a KF or EKF, resulting in xk|k .In many cases, the measurement z k is not identical to the estimated state, but it is a result of the transformation where h s,k denotes the state estimate transformation function that is not necessarily identical to the h k function.The first step of the state estimation is to compute the xk|k−1 a priori state estimate.The process model f k (x k ) predicting the objects is given by the commonly used constant acceleration (CA) model with F k transition matrix as where Δt denotes the time elapsed between t k and t k−1 .The linear process model in ( 16) is applied for the state prediction of the uniform false detections as well, since they are not generated by the static environment objects.The P k|k−1 predicted estimate covariance is computed as where Q k denotes the process noise covariance that is given based on the σ a,x and σ a,y acceleration scales as The submatrix q k of process noise covariance is computed by the following diadic product: The xk|k−1 a priori state estimate is updated by the z sim k simulated measurement based on the K k Kalman gain as The z sim k measurement is simulated according to the simple measurement model described in (14).The K k Kalman gain and the P k|k a posteriori estimate covariance are computed by the equations of the Kalman Filter.

III. PARAMETER IDENTIFICATION
The parameters of the sensors are identified by a pre-evaluated real-world measurement in the offline training phase to obtain a more realistic simulation.Then in the online phase, the identified parameters can be used to simulate the real sensors in arbitrary environments and scenarios generated with a 3D simulation environment (e.g., Carla, IPG CarMaker), as illustrated in Fig. 5.The simulated sensor cluster consists of a Continental ARS408 smart radar and a Mobileye EyeQ2 smart camera, providing tracked object detections.Therefore, in this article, only the model of sensors providing tracked objects is validated; however, the single-shot detectors have a much more straightforward model.The field of view of the simulated sensors and the overall surveillance area is illustrated in Fig 6 .The measurement setup includes a separate digital camera, too, synchronized with the aforementioned simulated smart sensors, allowing video annotation for ground truth generation.The measurement that serves  as the basis of the parameter identification was recorded on a Hungarian highway in nice weather and usual afternoon traffic conditions, and it consists of 7000 frames (≈ 120 seconds), including guardrails, lamp poles, and highway bridges.Therefore, the clutter model of static environment objects includes these classes, but it can be extended with other classes of potentially false, static objects (e.g., traffic signs, buildings) by labeling them in the evaluation of the training data used for the parameter identification.Straightforward rules on the relative velocity and position of the false positive detections can support the labeling.The ground truth objects of the road scenario are identified by annotating the video record of the digital camera in the open-source DarkLabel tool.The annotated objects are transformed onto the ego vehicle coordinate system by a projective transformation.
The first step of the measurement evaluation is the association between the annotated ground truth-and detected objects which is performed by a GNN (Global Nearest Neighbor) algorithm [83].The algorithm assigns the detected objects to the reference objects within a d c = 10 m overall and d c,lat = 1.5 m lateral cutoff distance so that the global distance of the association is minimal.The distance matrix D ij ∈ R +(N,M ) , as the basis of the GNN algorithm, is defined by the Mahalanobis-distance d MH (x i , z j ) of the reference object x i and object detection z j as where N and M denote the number of reference-and detected objects, while x pos where α denotes the angle of the road path at the longitudinal distance of object x i .
The detected objects that cannot be assigned to any of the ground truth objects within the distance d c are assumed to be false detections, while the matched objects are considered as true positive detections.The reference objects that do not have any associated object detections provided by the evaluated sensor are denoted as false negatives.

A. Detection Model
The parameters of the proposed detection model are estimated based on the recall map of the evaluated sensor.The overall recall of a sensor is computed by the ratio of the true positive and reference objects' cardinality as where T P and F N denote the number of true positive and false negative objects as in [6].The surrounding environment is divided into elementary cells by discretizing the distance and azimuth angle of the measurement space with 1 m distance and  I.
The fitted detection models of the smart camera and radar are visualized and compared with the manufacturer datasheet FoV of the sensors in Figs.7 and 8.It can be seen that the fitted detection models give a reasonable estimate of the FoV limits, except in the far scan zone of the radar.Since it is challenging to annotate the distant objects of the road scenario, the reference objects are provided up to 100 meters.Therefore, the detection model cannot be fitted with high reliability over this distance.

B. Clutter Model
The parameter identification of the clutter model is more complex since it is not only necessary to identify the false positive detections, but one must classify them as well according to the clutter types.The classification of the clutter detections has been performed by several rules depending on their relative lane position and velocity.This research has considered the clutter detections generated by guardrails, lamp poles, bridges, or other unknown reasons (e.g., random sensor noise, detection algorithm).The statistics of these detections, such as the total cardinality, the number of new detections, and their spatial distributions, are determined frame by frame.The new detections generated by an object class with label l at k-th cycle are identified based on the GNN data association performed between the detections of the current and previous cycles.The Poisson rate factor c λ (l) of object class l is estimated, in the timestamp interval [k s , k e ] within the object is present, as nearly constant as where k s and k e denote the starting and ending timestamp index of the time interval, and v ego is the velocity of the ego vehicle.
The λl rate at the average velocity within the estimation time interval is computed by the Matlab Distribution Fitter by fitting a Poisson distribution on the number of new detections.The distance resolution d res (l) and the maximum number of Gaussian components n max (l) constructing the objects with labels are determined based on the maximum distance d max the objects can be still perceived and the maximum number of detections max(n(z i ↔ C i S S ) assumed to be generated by the object class l.The parameters of the different false classes are detailed in Table II, where c λ denotes the birth constants in ( 8) and ( 10) of the corresponding clutter class, d res is the distance between the n max number of Gaussian components of a series.Note that d res and n max are only interpreted for the classes modeled by a series of Gaussian components and not for bridges and uniformly distributed false detections.The parameters σ 2 x and σ 2 y in Table II define the longitudinal and lateral position variances in P of the Gaussian components modeling the corresponding potentially false object that is not interpreted for uniformly distributed false objects.The distance PDF p dist (ĉ, l) of newborn clutter detections with label l is described by a custom piece-wise linear distribution function based on the density values of the discrete distance intervals The weight function f wo ( λl , ĉ) of a one-sample Gaussian component that computes the state-dependent Poisson rate of the component with label l can be formed by the following linear system of equations: Although the mean value of the Gaussian components is given by the static environment descriptor of the road scenario, their initial covariance must be specified.As it was described in Section II-B, the spatial distribution of a Gaussian component S is set to be parallel with the road path.The eigenvalue σ 2 x of the position covariance sub-matrix

C. Measurement Model
The parameters of the measurement model consist of the observation model h k (x k ), the state estimate model h s,k (x k|k ), and the measurement covariance R k of the sensors.The state estimate transformation h s,k (x k|k ) considers the measurements provided by the sensors.Besides their longitudinal and lateral positions, d x and d y , the smart camera provides the longitudinal velocity v x and acceleration a x of the objects.The radar divides both the velocity and position vector into d x , d y , v x , and v y longitudinal and lateral components, but it does not give information about the acceleration of the objects.Since the measurements of both sensors are included in the state vector xk|k , the function h s,k (x k|k ) can be described by the linear transformation: where the state transform matrices of the sensors are given as The observation model h k (x k ) defines the connection between the state space and the raw measurements.In general, radars can measure objects' distance, azimuth angle, and radial velocity, while smart cameras observe their position and, in some cases, the velocity based on optical flow.However, for simplicity, the sensors are assumed to measure the subspace of the state directly so the observation model can also be expressed as a linear transformation as The observation matrix H radar of the radar represents that it can measure the position of the objects, and since it is a front-looking radar, the radial velocity is estimated by the longitudinal velocity.The H camera is given identically since it can directly observe the position of the objects and the longitudinal velocity based on optical flow as The observation matrix must be extended accordingly if the target sensors can measure other attributes (e.g., dimensions).The covariance, R i k , of the measurement noise η i k is estimated based on the x i k reference object's state and z i k measurement: where N k denotes the number of cycles on which the expected value of the covariances of the objects are computed.Since the sensors provide tracked object detections, to eliminate the effect of noise filtering, N k is set to 5, considering only the initial cycles of a track during the state estimation are not stable yet.The generic R k measurement covariance is determined based on the average of the R i k object covariances, assuming that the measurements are independent as where n denotes the total number of objects.The variances of the sensor measurements are included in Table III.

IV. RESULTS
The simulation is validated on a different 7000-frame section of the measurement along with three aspects: the fidelity of the detection and tracking model, the similarity between the clutter density of the actual and synthetic sensor data, and the overall performance metrics of the real and simulated sensors.Since the detection and tracking model describes the detectability of the valid objects, it is evaluated by comparing the object detections provided by the simulation and the actual sensor, neglecting the clutter detections.The association between the simulated and actual detections is performed by the same GNN method used for parameter identification.Then, the performance metrics, the recall, precision, and F1 score of the detection and tracking model are computed.The recall given in (23) indicates the proportion of the object detections provided by the sensor found by the simulation too.The precision expresses how many of the simulated detections correspond to an actual sensor detection as where F P denotes the total number of false positive detections.The comprehensive F1 score performance metric, is computed based on the precision P r and recall Rc [84] as The performance metrics of 10 simulations provided by the proposed and a conventional Bernoulli distributed detection model are averaged and compared in Table IV.The proposed model provides false detections less frequently when the actual sensor does not detect either the object, resulting in about 3-5% higher precision for the radar and camera, respectively.According to the recall, the proposed method simulates almost 10% better when the actual sensor is able to detect the surrounding objects.Therefore, the proposed detection and tracking model provides 7.22% and 5.75% higher fidelity for the camera and radar than the conventional Bernoulli distributed model.The less significant difference in the case of the radar is due to the aforementioned reason that the parameters of the multiple scan zone detection model are more complex to be identified.The clutter detections of the simulation do not have to match frame-by-frame perfectly the ones of the simulated sensor but their spatial distribution should be similar.Therefore, the clutter model is evaluated by comparing the κ(z) clutter density, i.e., the frequency of the clutter occurrence in a specific part of the environment, of the proposed model, the commonly used conventional PPP, and the actual sensor detections.The κ(z) clutter density is approximated by κ([z i , z i+1 ]) discretized clutter map.In the evaluation, we considered the κ(Ω i ) spatial distribution of the clutter density, where Ω i denotes the grid cell given by [d x,i x , d x,i x +1 ] longitudinal, and [d y,i y , d y,i y +1 ] lateral position intervals.The κ(Ω i ) clutter density value of the i-th grid cell is computed as    ))) maximum simulated and measured intensity.However, this method does not consider the fact that the clutter intensity value may be greater than the dynamic range given by the highest intensity.Since the 1 × 1 meter size of the grid cells is smaller than the object separation distance of the sensors, the p c (Ω i ) ∈ {[0, 1] ⊂ R} clutter probability gives approximately the density as Therefore, the dynamic range interval [0,1] can be applied for SSIM providing a more meaningful measure of clutter model's fidelity.Furthermore, we applied four different Gaussian weighting circles in SSIM.The higher the radius of the circle, the more neighboring pixels are looked at in the similarity measure, considering the structure of the map instead of pixel-wise similarity.The similarity metrics of the conventional PPP and proposed clutters model are detailed in Table V, including the results of the two dynamic ranges and radius values of four weighting circles.Since the camera provides few clutter detections, the clutter density map of the actual camera measurement, consisting of many 0 intensity values, is easier to be simulated.Therefore, the similarity metrics in the dynamic range [0, 1] show a high, more than 99% fidelity of both the conventional and the proposed clutter model.However, the clutter density map of the proposed model, considering the tracking of the false detections, is ≈ 2% more similar to the simulated camera in the [0, max κ ] range.The radar has a much more complex non-uniform clutter model resulting in lower similarity metrics, particularly in the case of max κ range.Still, the difference between the proposed and conventional PPP clutter model is more significant, increasing  Finally, the overall performance metrics of the simulated sensor data, namely the aforementioned recall, the precision, and the F1 score, are compared to the same metrics of the actual sensor data.The averaged metrics of 10 simulations are compared to the actual sensors in Table VI, including the proposed model and a conventional one using Bernoulli distributed detection and PPP clutter model.According to the F1 scores, both sensors are simulated with high fidelity by the proposed method since the differences to the actual sensors are less than 1% in contrast with the conventional model that under-and overestimates the camera and radar performance, respectively.The precision related to the false positive detections is more realistic for the radar simulation than for the camera since the latter considers only the clutter detections due to unknown reasons, i.e., partially uniform clutter appearance.The precision of the radar data simulated by the proposed model is much closer to the actual precision than the conventional model, indicating that the proposed clutter model is more reliable than the commonly used PPP clutter model.In contrast, the camera's recall estimation is more reliable because the radar has a complex detection model consisting of multiple scan zones.The proposed model considering the built-in tracking module of the actual sensors, achieves 5% better camera recall in simulation than the conventional model using Bernoulli distribution.Both the precision and recall of the sensors simulated by the proposed data-driven sensor model have more than 98% fidelity.
The fidelity of the radar clutter model is confirmed by Figs.11 and 12 demonstrating the output of guardrail, lamp pole, and highway bridge simulations on a frame.Fig. 11 shows that both the spatial and cardinality distribution of the false positive objects generated by the guardrail accurately match the detections provided by the clutter model for guardrails.It also exposes the simulated radar detections corresponding to street lamp poles that do not precisely match the actual sensor measurement.This is anticipated since they are less frequently located along the road path and the radar cross-section, so the detection probability is  lower compared to guardrails.Fig. 12 shows an example of highway bridge clutter simulation.The number of simulated bridge detections gives a suitable estimate of the detections observed by the actual sensor.Although the position of the points reflected on the bridge does not fit precisely the measured reflections, the simulation still provides a fair estimation.Furthermore, since the clutters are simulated according to a random distribution,  the result should be evaluated statistically.The detection model performance and its limitations in a more complex scenario involving standstill objects in the emergency lane are illustrated in Figs. 13 and 14.The detectability of the police car detected by both the actual radar and camera is simulated correctly, while the farther car is missed by the real sensors and the simulator as well shown in Fig. 13.Moreover, the measurement error regarding the position of the detections is also simulated accurately.The slight difference in Fig. 13 is coming from the simulated clutter detections assumed to be generated by the lamp poles.However, as mentioned earlier, it does not distort the simulation data significantly if the clutter model does not match the detections of the actual sensors frame-by-frame, but statistically, they are similar.The limitations of the proposed detection model are illustrated in Fig. 14 showing that the radar detects both standstill cars in the emergency lane but the simulator assumes the closer one undetected.Furthermore, despite the simulated camera detecting the police car, the real sensor misses both standstill objects, and false detections occurs over the guardrail.The pedestrians appearing in Fig. 14 are not considered in the simulation nor the evaluation because of the lack of pedestrian detection in the training data used for parameter identification, which is also a limitation of the current status of the proposed method.Although the simulation does not perfectly reconstruct frame-by-frame the detections of the actual sensors due to the limitations of the proposed method, statistically, it provides high-fidelity synthetic data.Using this simulated data for testing, one can evaluate and further improve the performance of the environment perception algorithms in different scenarios, increasing the safety of the ADAS and HAD functions.

Fig. 2 .
Fig. 2. The detection model derived from the sensor FoV.

Fig. 3 .
Fig. 3.The generic overview of the proposed clutter model.
of individual one-sample components as shown in Fig 3, where n C S and n c O denotes the number of component clusters and the individual one-sample components.In the parameter identification and evaluation, we considered the guardrails (i S = 1) and lamp poles (i S = 2) as series components since they usually generate a sequence of detections, therefore n C S = 2 in this case.The highway bridges are regarded as individual Gaussian components.The number of bridges in the view range of the simulated sensors determines n c O .

Fig. 4 .
Fig. 4. The spatial PDF of the guardrail (left side) and lamp poles (right side) Gaussian components.
, are generated based on the static environment description of the generated scenario.The state ĉi O O is computed according to the location of the object and the dynamics of the ego vehicle.The covariance matrix P i O O is given so that the spatial distribution of the components corresponds to the shape and angle of the object and tr( P i O O ) is proportional to the object's reflecting area |A r |.The weights w i O O = λc i O O , as opposed to the clusters including the series of Gaussian components, directly define the Poisson rate of the cardinality of new clutter detections generated by the one-sample component c i C i O O overall Poisson rate of the object class with label l O does not change and the state-dependent Poisson rate of the component c i O O is proportional to p dist (ĉ i O O , l i O O ).The Gaussian clutter detections C G,k at timestamp k are constructed by Algorithm 2. First, the false Gaussian objects,

Fig. 5 .
Fig. 5.The relation of parameter identification and simulation.

Fig. 6 .
Fig. 6.The field of views (FoV) of the simulated sensors.

i and z pos j are the position
vector of the corresponding objects.The covariance matrix S is given based on the ratio d r = d c /d c,lat of the cutoff distances as S = cos α − sin α sin α

1 •
angle resolution.The Rc([d i , d ,i+1 ], [ϕ j , ϕ j+1 ]) recall map is constructed by computing the local recall of [d i , d i+1 ] × [ϕ j , ϕ j+1 ] intervals.The coasted, i.e., missed-detected but tracked objects that are currently outside but were previously located within the FoV, are not considered in the recall map to neglect the tracking of the sensor.Finally, the proposed model is fitted to the recall map by minimizing the Mean Squared Error (MSE).We have used the Matlab Optimization Toolbox to identify the optimal parameters of the proposed detection model considering the criteria that the distance b d and the angle breakpoints b ϕ cannot be located outside the FoV and the maximal detection probability p D,max shall be in the range [0,1].The optimal parameters in (2) and (3) determining the pD (d, ϕ) detection probability map of the sensors, including the two different scan zones of the radar, are summarized in Table

Fig. 7 .
Fig. 7. Comparison of the fitted detection probability map and the manufacturer datasheet FoV of the smart camera.

Fig. 8 .
Fig. 8.Comparison of the fitted detection probability map and the manufacturer datasheet FoV of the smart radar.

Fig. 9 .
Fig. 9.The clutter density maps of the simulated (left) and the real radar detections (right).

Fig. 10 .
Fig. 10.The clutter density maps of the simulated (left) and the real camera detections (right).
simulated clutter detections.The SSIM algorithm determines the quality, i.e., the similarity of digital images compared to a given reference.The clutter density maps simulated with the conventional and the proposed clutter models are compared to the density map of the actual sensors measurements as a reference.The dynamic range parameter of the SSIM defines the maximum intensity values in the input maps (e.g., 255 in the case of RGB images).If the density maps in Figs. 9 and 10 are compared as images, the dynamic range is given by the max κ = max(max(κ(Ω sim i )), max(κ(Ω meas i with the circle radius.It means that the clutter density map of the proposed model is significantly closer to the actual radar data than the uniformly distributed detections of the PPP model regarding their structure.Furthermore, the similarity values of the proposed clutter model are greater than 93% for all radius values in the [0,1] range representing the fidelity of the clutter detections better than with max κ , considering the valid range of clutter intensity values.

Fig. 11 .
Fig. 11.Qualitative result illustrating the simulated (left) and real (right) clutter detections generated by guardrail and street lamp poles.

Fig. 12 .
Fig. 12. Qualitative result illustrating the simulated (left) and real (right) clutter detections generated by highway bridge.

Fig. 13 .
Fig. 13.Qualitative result illustrating the simulated (left) and (right) detections generated by standstill objects in emergency lane.

Fig. 14 .
Fig. 14.Qualitative result illustrating the simulated (left) and real (right) detections generated by standstill objects in emergency lane.
V. CONCLUSIONThe proposed sensor model can simulate the object-level data, i.e., detections or tracked objects, of a generic smart sensor with high fidelity based on the data-driven identification of the model parameters.The simulator tackles tracked object detections as well by involving a tracking model that extends the detection model fitted to the actual sensor.The measurement model can also handle tracked objects simulating the measurement uncertainties of the sensors.The clutter model reproduces the clutters provided by the sensor with high accuracy regarding their cardinality and spatial distribution, considering different types of false detections.Since the surrounding scenario-and static environmental objects are usually provided by the commonly used 3D simulation environments on which the proposed model relies, the simulator can generate the sensor data in an arbitrary road scenario.Therefore, the simulator can support the development and testing of different environment perception modules, such as sensor data fusion, even in corner cases, increasing the safety of ADAS and HAD functions.We intend to improve the proposed model with a more detailed radar and camera clutter model considering multi-path detections and digital image processing failures that are now handled by the partially uniformly distributed clutters due to unknown reasons.Furthermore, we plan to extend the proposed simulation with the consideration of object occlusion in the detection and tracking model.Tamás Bécsi (Member, IEEE) received the M.Sc.and Ph.D. degrees from the Budapest University of Technology and Economics, Budapest, Hungary, in 2002 and 2008, respectively.Since 2005, he has been an Assistant Lecturer and since 2014, he has also been an Associate Professor with the Department of Control for Transportation and Vehicle Systems, Budapest University of Technology and Economics.His research interests include linear systems, embedded systems, traffic modeling, and simulation.His research and industrial works have involved railway information systems and vehicle control.Olivér Törő received the M.Sc.degree from the Eötvös Loránd University, Budapest, Hungary, in 2010 and the Ph.D. degree from the Budapest University of Technology and Economics, Budapest, in 2022.Since 2018, he has been an Assistant Research Fellow with the Department of Control for Transportation and Vehicle Systems, Budapest University of Technology and Economics.His research interests include object detection and tracking in road traffic applications, multi-object state estimation, and nonlinear filtering.Péter Gáspár received the M.Sc.and Ph.D.degrees from the Faculty of Transportation Engineering and Vehicle Engineering (KJK), Budapest University of Technology and Economics (BME), Budapest, Hungary, in 1985 and 1997, respectively, and the D.Sc.degree in control from the Hungarian Academy of Sciences, Budapest, in 2007.Since 1990, he has been a Senior Research Fellow with the Institute for Computer Science and Control and Since 2016, he has also been a Research Professor.In 2004, he became the Head of the Vehicle Dynamics and Control Research Group and then in 2017, he became the Head of the Systems and Control Laboratory, SZTAKI.He was habilitated at the BME, in 2008, and he was appointed as the University Professor.Since 2013, he has also been the Head with the Department of Control for Transportation and Vehicle Systems (KJIT), BME KJK.His research interests include linear and nonlinear systems, robust control, multi-objective control, system identification, and identification for control and artificial methods.His research and industrial works have involved mechanical systems, vehicle structures, and vehicle dynamics and control.Since 2016, he has also been a Corresponding Member of MTA.He is also a Member of the IFAC Automotive Control and Transportation Systems Technical Committee, and Chair of the International Federation of Automatic Control (IFAC) Hungary National Member Organization.

TABLE I ESTIMATED
(3)AMETERS OF THE PROPOSED DETECTION MODEL IN (2) AND(3)

TABLE II ESTIMATED
PARAMETERS OF THE CLUTTER CLASSES xy is given by σ 2 x = (d res (l i S S )/3) 2 to obtain overlapping spatial PDFs.Whereas σ 2 eigenvalues indicate the shape and the area of the reflecting surface A r .The eigenvalues of the position covariance corresponding to the different object classes are detailed in Table II.The other elements of the P i c S and P i O O complete covariances are set in accordance with the parameters of the measurement model detailed in Section III-C.

TABLE III ESTIMATED
PARAMETERS OF THE SENSORS' MEASUREMENT MODEL

TABLE IV COMPARISON
OF THE PROPOSED AND CONVENTIONAL DETECTION MODEL

TABLE V STRUCTURAL
SIMILARITY OF CLUTTER DENSITY MAPS

TABLE VI PERFORMANCE
METRICS OF THE REAL AND SIMULATED SENSORS