GM-PHD Filter Based Sensor Data Fusion for Automotive Frontal Perception System

—Advanced driver assistance systems and highly automated driving functions require an enhanced frontal perception system. The requirements of a frontal environment perception system cannot be satisﬁed by either of the existing automotive sensors.Acommonlyusedsensorclusterforthesefunctionsconsistsofamono-visionsmartcameraandautomotiveradar.Thesensor fusionisintendedtocombinethedataofthesesensorstoperformarobustenvironmentperception.Multi-objecttrackingalgorithms haveasuitablesoftwarearchitectureforsensordatafusion.Sev-eralmulti-objecttrackingalgorithms,suchasJPDAForMHT, havegoodtrackingperformance;however,thecomputationalre-quirementsofthesealgorithmsaresigniﬁcantaccordingtotheir combinatorialcomplexity.TheGM-PHDﬁlterisastraightforwardalgorithmwithfavorableruntimecharacteristicsthatcantrack anunknownandtime-varyingnumberofobjects.However,theconventionalGM-PHDﬁlterhasapoorperformanceinobject cardinalityestimation.ThispaperproposesamethodthatextendstheGM-PHDﬁlterwithanobjectbirthmodelthatreliesonthe sensordetectionsandarobustobjectextractionmodule,including


GM-PHD Filter Based Sensor Data Fusion for Automotive Frontal Perception System
László Lindenmaier , Szilárd Aradi , Member, IEEE, Tamás Bécsi , Member, IEEE, Olivér Törő, and Péter Gáspár, Member, IEEE Abstract-Advanced driver assistance systems and highly automated driving functions require an enhanced frontal perception system.The requirements of a frontal environment perception system cannot be satisfied by either of the existing automotive sensors.A commonly used sensor cluster for these functions consists of a mono-vision smart camera and automotive radar.The sensor fusion is intended to combine the data of these sensors to perform a robust environment perception.Multi-object tracking algorithms have a suitable software architecture for sensor data fusion.Several multi-object tracking algorithms, such as JPDAF or MHT, have good tracking performance; however, the computational requirements of these algorithms are significant according to their combinatorial complexity.The GM-PHD filter is a straightforward algorithm with favorable runtime characteristics that can track an unknown and time-varying number of objects.However, the conventional GM-PHD filter has a poor performance in object cardinality estimation.This paper proposes a method that extends the GM-PHD filter with an object birth model that relies on the sensor detections and a robust object extraction module, including Bayesian estimation of objects' existence probability to compensate for drawbacks of the conventional algorithm.
Index Terms-Advanced driver assistance, Gaussian mixture model, multi-object tracking, object detection, PHD filter, radar detection, sensor fusion, smart cameras.

I. INTRODUCTION
T HE need for autonomous vehicles results in a gradual increase in the number of automated functions [1]- [3].The role of the environment perception becomes more significant with the higher automation level of vehicles.Most advanced driver assistance functions, such as Autonomous Emergency Braking (AEB) or Adaptive Cruise Control (ACC), rely on an enhanced frontal perception system [4], [5], hereinafter referred to as EFPS.The EFPS consists of three main modules: lane detection, object detection, and multi-object tracking subsystems.Object detection and tracking have several difficulties: the number of objects within the surveillance area is time-varying and unknown, missed detections, and false alarms may occur due to sensor uncertainty.Furthermore, the sensors cannot observe all the relevant state variables of the true objects, and the measurements are noisy.This paper focuses on multi-object tracking that aims to accurately estimate the cardinality and the state of present objects in front of the ego vehicle.However, none of the available sensors can satisfy alone the requirements of the EFPS.For example, radars can measure the spatial distance of the targets precisely, but they have a poor performance in azimuth estimation.Meanwhile, mono-vision cameras have a good lateral position resolution but inaccurately estimate the objects' longitudinal distances.Thus, sensor data fusion is needed to obtain a robust and precise representation of the environment [6]- [8].

A. Related Work
Multi-object tracking algorithms have a suitable architecture for sensor fusion purposes.The commonly-used Probabilistic Data Association (PDA) [9] and Joint Probabilistic Data Association (JPDA) [10] multi-object tracking algorithms estimate the state of a known number of objects, often fusing the data of multiple sensors [11]- [13].Since, in real applications, the number of objects is time-varying and unknown, an estimation of the object cardinality is needed.The Integrated Probabilistic Data Association (IPDA) [14] and Joint Integrated Probabilistic Data Association (JIPDA) [15] filters extend the previous approaches to estimate the existence probability of the objects as the basis of object management.However, the performance and runtime of these algorithms depend highly on the gating size applied around the objects.In addition, the complexity of JPDA and JIPDA is combinatorial, which makes real-time applications complicated.The Multiple Hypothesis Tracker (MHT) [16], [17] considers the dynamics of data association by propagating it in time.This multi-object tracking algorithm is also often used as a sensor fusion module of object perception as in [18], [19]; however, the computational cost of this approach is more significant compared to IPDA and JIPDA.Random Finite Sets (RFS) provide an alternative way to model the object list consisting of object states and possibly labels in an arbitrary order.RFS-based multi-object tracking can handle object birth and death, clutter measurements, and missed detections by inserting the multi-object PDF of RFS This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/into the recursive Bayesian framework.The implementation of optimal and exact RFS-based recursive filtering is complex and not tractable due to multiple set integrals.A suitable alternative approach, the Probability Hypothesis Density (PHD) filter, propagates and updates the first-order moment of the RFS [20].Although the PHD filter does not have a closed-form solution, the Gaussian Mixture (GM-PHD) [21] and particle [22] approximations can provide tractable recursions.
The GM-PHD filter is a straightforward algorithm with a low computational cost that recursively estimates the first-order moment of the multi-object RFS.However, it requires an additional module to extract the potential objects from the updated PHD function consisting of Gaussian components.The GM-PHD filter provides an estimation of the object cardinality, but this variable follows a Poisson distribution.Since the standard deviation of a Poisson distribution is proportional to the mean value, this estimation leads to noisy cardinality estimates in the case of many targets.Another way for object extraction is to apply a threshold on the existence probability of Gaussian components computed by the GM-PHD filter.However, this approach underestimates the number of present objects.Numerous papers aim to overcome the limitations of the PHD filter.An approach is to introduce state dependent detection probabilities as reported in [23], [24], [25], [26].In [27] state-dependent survival probability is considered in a scenario with low detectability.For visual based tracking, the GM-PHD filter, proposed in [28], creates a cost function for object assignment and creates trajectories using an auction algorithm.A multi-frame scheme is introduced in [29] to deal with estimates of undetected targets.To preserve the standard measurement model, a penalization scheme [30] and a a competitive algorithm [31] is introduced to renormalize weights along the objects and measurements.

B. Contributions of the Paper
This paper extends the conventional GM-PHD filter with a robust object extraction module that estimates the existence probability of the objects in a Bayesian manner.The proposed method efficiently tackles the missed detections and clutter measurements.Therefore, it provides a better estimation of the object cardinality, obtaining a robust object perception for ADAS and highly automated functions' EFPS.The object extraction module first performs a data association between the Gaussian components and the sensor detections by two different approaches based on the component weights considering them as association probabilities instead of existence probabilities.Then, the updated PHD function components and the detections are clustered by the k-Nearest Neighbor algorithm [32] avoiding exponential complexity.The elements of a cluster are fused, relying on the IPDA algorithm.The IPDA weights and the objects' PoE (Probability of Existence) are computed by the updated weights of the Gaussian components.This approach results in a non-parametric IPDA algorithm in terms of gating size.Although the Bayesian IPDA algorithm can be derived based on RFS theory [33]; however, this leads one to the already mentioned implementation problems.A connection between the JIPDA tracking algorithm and the GM-PHD filter has been pointed out in [34], but the object's existence probability is not computed based on the Bayesian rule.Furthermore, we introduce a method for modeling the birth probability of measurements that efficiently identifies new components in the PHD function.The proposed robust GM-PHD filter significantly improves the conventional GM-PHD filter, and it achieves the performance of the state-of-the-art JIPDA algorithm, preserving the beneficial complexity of the conventional GM-PHD filter.

II. THEORETICAL BACKGROUND
This section briefly summarizes the theoretical background of the proposed method.The GM-PHD filter is based on finite set statistics theory (FISST) [35] which provides mathematical apparatus for computations with random finite sets.After introducing RFS, the section discusses how the PHD workflow simplifies the filtering of the multi-object PDF and presents a practical variety, the Gaussian Mixture PHD filter.

A. Random Finite Sets
A random finite set X consists of n random vector variables X = {x 1 , . .., x n }, where the cardinality |X| = n is also a random variable with (n) probability mass function.The PDF of a multi-object RFS is given by The multiplier n! corresponds to the fact that the elements of an RFS are in arbitrary order, thus the permutation of the elements results in equivalent RFS.Suppose X = [x 1 , . ..x n ] is an ordered array, then the connection between two multi-object PDFs is written as Although the multi-object PDF given in (1) can be inserted into a recursive Bayes filter, the exact solution has a high computational cost because of its combinatorial complexity.The PHD filter provides a good approximation to compute the posterior RFS in real-time.

B. Gaussian Mixture PHD Filter
The PHD function is the first-order moment of the multiobject PDF, which is defined as The higher the value of the PHD function, the more likely the corresponding state represents a real object.The expected number of objects (cardinality of RFS) over a given region Ω of the state-space can be computed as the integral of the PHD function over this region: In the PHD filter the object model is based on Poisson Point Process (PPP) in that the cardinality of the RFS follows a Poisson distribution [36].Thus, the PDF p(X) of a multi-object RFS and its PHD (or intensity) function D(x) can be written as where p(x) denotes the PDF in the single-object state space and λ is the Poisson rate of object cardinality that equals the number of expected objects.
The PHD function can be inserted into the Bayesian framework to obtain an estimation of the multi-object RFS.The prediction for the PHD function, given the posterior In equation ( 7) b k|k−1 (x) denotes the birth PHD that corresponds to the appearing objects while the second part of the equation is the Chapman-Kolmogorov prediction of the prior, where p S (x) denotes the survival probability and The PHD function is updated using the measurement set where p d (x) denotes the detection probability of an object with state x, κ(z) is the clutter density and g k (z|x) is the measurement likelihood.It should be noted that the measurement set does not only contain the object originated elements O k ⊆ Z k but the C k ⊆ Z k clutter measurements as well.In equation (8), the first term corresponds to miss-detected objects, and the second part of the expression refers to measurement update.This equation does not have a closed-form solution; the PHD function is usually approximated by an assumed density.
The GM-PHD filter approximates the PHD function by a weighted sum of H Gaussian components where N (x; xh , P h ) denotes the normal distribution of the component with mean xh and covariance P h , while w h is the weight of the corresponding component.
Given the posterior PHD function D k−1|k−1 (x) in the above form, the predicted GM-PHD function can be written as where D S k|k−1 (x) denotes the predicted PHD of the previous components representing the surviving objects: and b k|k−1 (x) is the birth PHD: The birth PHD should capture the locations in the state-space where objects are assumed to appear.The birth model of our work is detailed in Section III-C.
The prediction of both surviving and birth components can be separated into two steps.In the first step, the spatial distribution of the components is predicted based on a Kalman prediction [37].In the second step, the weights are scaled by the survival probability: The number of the predicted components is computed as Then the updated PHD consists of two parts: the PPP part representing the miss-detected objects and the multi-Bernoulli (MB) RFS, which corresponds to the detected objects: The PPP intensity consists of H k|k−1 components: The MB part of the posterior PHD contains m k H k|k−1 components capturing all the possible component-measurement pairs: where the N (x; xh,i k|k , P h,i k|k ) spatial PDF of the updated components is proportional to the product of the N (z i k ; H k x, R i k ) measurement likelihood and the N (x; xh k|k−1 , P h k|k−1 ) predicted PDF.Thus, the posteriori spatial PDF is computed by the Kalman update [37] based on the H k measurement model and R i k measurement covariance matrix.The weight w h,i k|k representing the existence probability of the component is updated as: where and S h k denotes the innovation covariance of measurement z i k and component state x h k|k−1 .The most beneficial property of the GM-PHD filter is that the complexity of the algorithm depends linearly on the H k|k−1 number of predicted components.However, the posterior PHD function consists of H k|k = H k|k−1 (m k + 1) components which means that the complexity of the algorithm would diverge.In practical applications, a mixture reduction is needed to avoid increasing computational costs.
The reduction of the number of components is performed by merging and pruning.A simple manner of mixture reduction is to keep the first nk|k components of the updated PHD function after sorting the w h k|k updated weights, where nk|k can be computed as In particle implementations of PHD filter, the resampled particles can also be given based on nk|k by Sequential Monte Carlo (SMC) simulation as in [22].However, since the multi-object RFS is modeled as a PPP, this estimation of the expected number of objects follows a Poisson distribution.It means that the higher the number of objects, the more uncertain the cardinality estimation is.A more robust way of object extraction from the updated PHD is to compare the w h k|k weights to a defined γ ∃ threshold.If the weight of a component is greater than the threshold, the component initiates an object in the multi-object RFS as in [21], [23].

III. METHODOLOGY
This section details the proposed robust GM-PHD filter fusing heterogeneous data from multiple sensors for EFPS of ADAS functions.In the evaluation, besides a simulation setup, we consider real measurement data to investigate our method in practical applications.The most commonly applied sensor cluster of frontal perception systems consists of a forward-looking radar and camera [38], [39] The proposed method was evaluated on a similar sensor cluster with characteristics detailed in Section IV-B.The system architecture depicted in Fig. 1 consists of two main layers and an additional step.The fusion layer is separated from the sensor layer resulting in a modular fusion architecture.In the sensor layer, only the acquisition of the sensor data is performed.The sensor and fusion layers are connected by the data synchronization that handles asynchronous sensor data and identifies the processable measurements at a given time.The fusion layer involves the GM-PHD filter and the object extraction module.If the data synchronization assumes a sensor data processable at a given time, it updates the GM-PHD filter, and the object extraction updates the object list.Since the fusion layer works on a joint object list, it utilizes the data of both sensors fusing the measurements in the order defined by the data synchronization.Finally, the posterior delay of the fused data is compensated by predicting the fused objects to the current time.The state prediction is performed based on a straightforward constant acceleration (CA) model detailed in Section IV-C.

A. Data Synchronization
Data synchronization is a fundamental part of every sensor fusion algorithm.This block is needed because: r Sensors provide data with different frequency r Sensors work asynchronously r Out-of-sequence measurements (OOSM) A measurement is assumed to be out of the sequence if its timestamp is older than the timestamp of the latest measurement.This phenomenon can occur because of the sensor data latency.If the timestamp of measurement were older than the latest update time of the filter, the prediction would be performed to the past,  referred to as "negative-time measurement update".There are different approaches to handle OOSM in multi-object tracking.One way is to approximate the solution by neglecting the process noise in the prediction step as in [40].An exact solution is also provided in [41].Other works deal with OOSM based on data buffering techniques as in [42], [43].According to [42] the sensor data latency consists of three main parts: the data acquisition, the pre-process, and the transfer, as shown in Fig. 2. Deterministic data buffering means that the frequency and the latency information of the sensors are provided.In [42], when a new measurement is provided, the timestamp of the following data is assumed to be known.Although this information is not provided in our sensor configuration, based on measurement analysis, the maximum latency of each sensor can be accurately estimated.A measurement updates the object list at a given time if none of the sensors may send data with an older timestamp; otherwise, it is buffered.A buffered data is assumed to be processable if: r the timestamp of the measurement is lower than the differ- ence of the current time and the maximum latency of each sensor and r it is the latest measurement of the sensor.
The other role of the data synchronization module is scheduling.According to [43], the two types of fusion algorithms are asynchronous and synchronous.Asynchronous fusion means that each sensor measurement triggers the algorithm.However, in the synchronous case, the fusion is called by one of the sensors or an independent scheduler with constant frequency.The deterministic data buffer allows asynchronous running of fusion; however, since the frequency of the smart camera is significantly lower compared to the radar, the camera triggers the fusion.Therefore, at least one camera measurement updates the fusion in almost every cycle.

B. Implementation of GM-PHD Filter
The GM-PHD filter described in Section II-B is implemented as in Alg. 1, where F k , Q k , H k and R k denotes the model parameters of the Kalman filter detailed in Section IV-C: the transition matrix, the process noise covariance, the observation matrix, and the measurement covariance, respectively.While S k , e k , and K k are the computed variables, namely the innovation covariance, the innovation, and the Kalman gain.
The miss-detected components are scaled based on the p d (x) detection probability; therefore, it is crucial to obtain a proper detection model.Some papers focus on the state-dependent detection model in poor detectability circumstances as in [23], [24].We also propose a state-dependent detection probability; however, this model considers the field of view (FoV) of the sensor.The missed detections are resolved by the robust object extraction detailed in Section III-D.Different detection probability is given if the object is within the sensor FoV and if it is outside the FoV.Therefore, the detection probability is given as: where p∅ and pd denotes the detection probability constant outside and within the sensor FoV.In (21), refers the subspace of xh k including the estimated position of the object.The expression d h xy,k ∩ Ω F OV corresponds to the event that the xh k object is located outside the Ω F OV space of the FoV.Wang et al. proposed a method that considers a state-dependent survive probability [27]; however, in our implementation the survive probability p S (x) = p S is given by a constant.
Most ADAS and HAD functions refer to the objects by their unique ID after the relevant target selection.Therefore, the {w k|k , xk|k , Pk|k } parameters of the GM-PHD components are extended with the required l k|k label consisting of the unique ID of the objects.The labels of the updated components are inherited from the corresponding previous components.Hence, the labels of the updated components are not unique.In the proposed robust GM-PHD filter, the unique ID of the present objects is established by the object extraction module.

1: given {w
, Z k Append birth components: 2: Prediction of previous and birth components: 3:

C. Birth Model
The birth model is intended to capture the newborn objects considering the locations where new ones can appear (e.g., on the edge of the sensor FoV).According to [23], [44], it is beneficial to compose the birth components based on the Z k−1 measurement set at time k − 1.Some works extend the IPDA and JIPDA algorithms with a birth model that computes the birth probability of the detections based on the present objects [45], [46].This paper proposes an extension of the conventional GM-PHD filter with a similar birth model; therefore, only the measurements Each measurement with a birth probability above a given p b,t threshold initiates a new component in the prior PHD based on the inverse measurement model and it will be updated in the next cycle.The pseudo code of the birth model is given in Alg. 2, where P0,n z + denotes the initial covariance containing the initial uncertainties of the state variables unobserved by the sensor providing the measurement.The weights w b k|k of the birth components equal to the initial existence probability p init (∃x b k ) of the corresponding object are computed based on the confidence , and birth density b(z i k ).The l b k|k labels of the birth components are defined based on the l h k|k labels of the updated components and the user-defined n max maximum number of objects by identifying the first unoccupied object ID.The GM-PHD filter extended with the proposed birth model is represented in Fig. 3.

D. Object Extraction
The GM-PHD filter detailed in Alg. 1 updates the PHD function by (m k + 1)H k|k−1 number of components.Without a mixture reduction, the number of components would diverge, and so does the algorithm's runtime.The purpose of the mixture reduction is not just to prune the unlikely components but also Algorithm 2: Birth Model. to extract the states of present objects.Therefore, the mixture reduction and the state extraction greatly impact the performance of the multi-object tracking algorithm.Erdinc et al. investigated the effect of missed detections on the existence probability of an object in a single object scenario [47].In case of single object tracking, the nk|k number of expected objects at timestep k equals to the p k|k (∃x i ) existence probability of object x i , that is computed as:

1: given {w
where nk|k−1 = p S (x i )n k−1|k−1 denotes the predicted number of objects (existence probability).However, this expression underestimates the existence probability (the expected number of objects) that can result false negative objects in the environment representation.In Bayesian manner the right estimation of the existence probability would be: Some papers also focus on robust object extraction to overcome the underestimation of the object cardinality [48], [49].Choi et al. proposed a robust GM-PHD filter that considers component merging, duplication check, and the tracking score of the objects [49].However, their method requires a gating parameter, such as IPDA or JIPDA, that may affect the tracking algorithm's performance and runtime.The proposed method associates the predicted components and the Z k measurements based on the p(x h k|k ↔ z i k ) association probabilities defined by the w h k|k updated component weights.Therefore, an adaptive gating parameter is realized.Furthermore,  the existence probability of the objects is computed based on the Bayesian rule given in (24), hence the underestimation of the number of objects is resolved without computing tracking scores.The flowchart of the object extraction is shown in Fig. 4.
The first step of the module is the data association that obtains the clusters whose members will be fused.This step is performed based on a locally adaptive k-nearest neighbor (Local kNN) algorithm [50].The k max parameter is applied as a saturation value for the number of cluster members, and the neighbors are identified based on the Greedy algorithm.Two data association approaches are proposed: the measurement-oriented and the track-oriented, compared in Fig. 5 and described in Alg. 3 and Alg. 4. The measurement-oriented approach assigns the {c h k|k−1 } h=1 predicted components to the {z i k } m k i=1 current measurements; in contrast, the detections are assigned to the predicted components in the track-oriented.The measurement-oriented algorithm creates two cluster groups: the C Z measurement-updated and the C ∅ misdetection cluster group.The measurement-updated group consists of clusters that contain one measurement, while the misdetection group involves those components that are not assigned to either of the measurements.The drawback of this approach is that the clusters in the measurement-update group, in most cases, represent one Bernoulli component of the updated Algorithm 3: Object Extraction -Measurement-Oriented Data Association.

1: given {w
PHD, neglecting the misdetection probability that is resolved in the merging process.The track-oriented approach assigns the current measurements to the predicted components, meaning multiple measurements can be associated with a component.Furthermore, since the bases of the clusters are the predicted components, each cluster is initialized by its c ∅ missed detected component.The track-oriented data association creates one {C h } H k|k−1 h=1 cluster group with clusters centralized to the predicted components.In both approaches, the cluster members correspond to a component of the updated PHD function: a measurement-previous component pair represents a Bernoulli component, and a singleton component refers to the PPP part of the updated PHD.
The second step of the object extraction is the merging and PoE estimation described in Alg. 5, that takes the h=1 weights and labels of the predicted components, and the previous objects defined by their estimated state, covariance and unique ID as inputs.First, the corresponding predicted component of a cluster must be identified by its l i C unique label for the recursive Bayesian PoE estimation.This process is slightly different for measurement-oriented and track-oriented clusters.Since the measurement-oriented clusters may contain more than one predicted component, the corresponding predicted component is identified by fusing the l c k|k ∈ C i C labels of the cluster members.Furthermore, suppose a measurement-oriented cluster is the member of the C Z measurement-updated cluster group.
In that case, the cluster does not consider the misdetection of the corresponding predicted component.Therefore, after identifying the l i C ID of the corresponding predicted component, the c ∈ C ∅ | l c k|k =l i C miss-detected component generated by it is appended to the cluster and removed from the C ∅ misdetection cluster group.In the track-oriented case, the corresponding predicted component's ID equals the cluster's index because the clusters are organized around the predicted components.In both cases, the label of the updated cluster will be inherited from the corresponding predicted component.The state of the merged cluster is extracted based on the β c IPDA weights of the cluster members.The p k|k (∃x i C ) existence probability of the objects represented by the x i C fused clusters is computed based on the Bayes rule as in the conventional IPDA algorithm.The Gaussian components of the reduced posterior are given by the clusters that contain at least one measurement and the miss-detected components with w c ∅ k|k weight above a given γ h component existence threshold.If a component meets one of these criteria, it is drawn to the reduced PHD function, but with w i h k|k = p k|k (∃x i C ) weight updated in Bayesian manner.The present objects are extracted similarly; however, two existence thresholds are applied.If the l i C cluster label is represented in the O k−1 previous object list, the p k|k (∃x i C ) existence probability of the cluster is compared to a γ ∃,low lower existence threshold.If it is tentative yet, the existence probability must reach a γ ∃,upp higher threshold for the confirmation.The clusters that meet the existence criteria are called confirmed objects, and the other clusters are referred to as tentative objects.The output of the object extraction is the O k list of the confirmed (present) objects and the reduced posterior PHD considering both the confirmed and tentative objects.

IV. EVALUATION
In the evaluation, the proposed GM-PHD based sensor fusion extended with the robust object extraction methods is compared with two conventional GM-PHD filters, namely the Poisson and threshold approaches and the state-of-the-art JIPDA algorithm.In the Poisson approach, the nk|k expected number of objects is computed according to (20) and after sorting the components by their weights, the first nk|k components of the PHD function initiates an object in the output.In the other conventional approach, the states of the components with weight greater than γ ∃ threshold are extracted as present objects.The γ ∃ existence threshold is set equal to γ ∃,upp the object initiation threshold of the proposed object extractions.Both of the conventional GM-PHD filters are extended with the proposed birth model described in Section III-C.The JIPDA filter is parameterized in the same manner as GM-PHD filters but according to [15] it expects z sensor detections in a V t k validation gate around track t at time k neglecting unlikely data associations.We applied 7 different V = V t k constant validation gate parameters but only the results of the optimal V = 100 is explained in Section V.
The performance of the five sensor fusion algorithms is evaluated in a simulation environment and based on real-world measurement data as well.In the simulation environment, the performance of the algorithms is investigated based on GOSPA (Generalized Optimal Sub-pattern Assignment) metric [51], that considers the localization error of the objects and the cardinality errors.The cardinality error involves both the false-negative (missed detections) and false-positive objects.The GOSPA is parametrized by d c = 10 cut-off distance threshold, p = 2 order and α parameter.In many cases, the performance of the environment perception is given by P r precision, and Rc recall metrics as in [4].If the alpha parameter is set as α = 2, the localization error, the missed detection, and the false positive detection components of the total GOSPA metric can be separated.Therefore, the precision and recall can be computed as: where T P , F P , and F N denote the total number of truepositive, false-positive, and missed detections.
The ground truth of the real measurement data is performed by video annotation.Thus it is unsuitable for computing an accurate localization error.Therefore, in the evaluation of the real measurement data, the F1 Score and MOTA (Multi-Object Tracking Accuracy) metric of the algorithms and sensors are computed as in [52], considering the precision, recall, and track ID mismatches.The annotated objects are transformed to the vehicle coordinate system based on the homography matrix of the mono-vision camera.The output tracks are associated with the ground truth based on GNN (Global Nearest Neighbor) Algorithm 5: Object Extraction -Merging and Poe Estimation.

1: given {C
Find corresponding predicted component: algorithm with a pre-defined d c cut-off distance threshold as in GOSPA; however, the localization error is neglected.
Besides the performance metrics, the proposed robust GM-PHD filters are compared to the state-of-the-art JIPDA filter in terms of computational effort.Since the real-world measurement detailed in Section IV-B includes different sections with a varying number of objects, the runtime comparison is detailed only for the measurement evaluation in Section V-B.Since the O(H k|k−1 , m k ) algorithm complexity depends on the H k|k−1 number of tracked components corresponding to potential objects and the m k number of detections, the runtime characteristics of the fusion algorithms according to them by computing the average runtime of a H k|k−1 , m k pair.

A. Simulation Environment
The simulation environment performs a high-level simulation of radar and smart camera detections.The real sensor data is pre-evaluated to obtain the clutter and the detection model of the sensors performing a realistic sensor simulation.The detection model is intended to describe the detection probability obtaining the parameters given in (21) based on the sensor FoVs.The clutter model is more complex since the false alarms are not uniformly distributed according to real sensor data.For example, the radar frequently detects the highway guardrail's poles and tracks these irrelevant detections of actual objects.Therefore, the spatial distribution of these false alarms is assumed to be a Gaussian mixture instead of uniform distribution.Therefore, the clutter model of the sensor is described by the sum of two independent parts, the uniform distribution, and the Gaussian mixture, as in: where U (V z ) and N (z, ĉi , Pi ) denote the uniform distribution in V z measurement space volume and the i-th component of the Gaussian mixture with ĉi mean value and Pi covariance.The cardinality of the false detections is assumed to be distributed according to Poisson distribution with different λU and λN Poisson rates for the uniformly distributed false alarms and the Gaussian mixture.The states of the guardrail poles are identified based on the dynamic state of the ego vehicle and the lane information.Since the radar tracks the poles, the parameters of the Gaussian components are updated by a Kalman filter detailed in Section IV-C.The measurement delay is simulated by a random value between the minimum and maximum latency of the sensors.The parameters of the sensors in the simulation environment are summarized by Table I.Two common ADAS scenarios, an Adaptive Cruise Control (ACC) and Autonomous Emergency Brake (AEB), are tested in the simulation environment.The scenarios generated with IPG CarMaker are demonstrated by Fig. 6.The track consists of a 900 meters long straight section and an arc with a radius of 750 meters and a total turn angle of 90 • .In both scenarios, six traffic vehicles (peer objects) are involved, lasting 40 seconds.A peer vehicle cuts between the ego vehicle and ACC target vehicles during the ACC scenario and forces the ego vehicle to decelerate.Four other objects also participate in this scenario to simulate more realistic traffic.The AEB scenario is more complex and dynamic, including lane change performed by the

B. Real-World Measurement Setup
The sensor cluster consists of an automotive Continental ARS408 radar and a Mobileye EyeQ2 smart camera providing the object detections as input data for the EFPS.Fig. 7 depicts Although the surveillance area extends up to 200 meters due to the radar; however, the measurement evaluation is performed on a smaller area since the ground truth objects can be annotated confidently up to 80 meters.The measurement setup is extended with a mono-vision camera supporting the annotation-based ground truth generation and visualization.The sensor data was logged through CAN communication by Vector CANape, and the video of the mono-vision camera was recorded in sync.The measurement was taken place on the beginning section of the Hungarian M1/M7 highway in Budapest towards Budaörs.The ground truth objects are created on a video section of 6000 frames (∼60 FPS, 100 seconds).The annotated frames were selected to evaluate the multi-object tracking in the heaviest traffic and various dynamic scenarios such as lane-keeping in different lanes and lane changes performed by the surrounding objects and the ego vehicle as well.

C. Model Description
The proposed sensor fusion tracks dynamic objects with x k state at time k relevant to planning a safe trajectory, such as pedestrians, bicycles, and road vehicles including cars, trucks, motorbikes.In the GM-PHD filter, the states of the Gaussian components are estimated by a Kalman filter that requires a process and measurement model.It is beneficial to apply a multimodel estimation considering different vehicle maneuvers as in [53], [54].Since this paper focuses on the object extraction module, therefore the x k object state at timestep k is computed by a constant linear acceleration (CA) model based on the x k−1 object state at timestep k − 1 as in: where F k and ν k denote the the transition matrix and the process noise with covariance Q k .The state vector is defined as: capturing the x longitudinal and y lateral components of the d position, v velocity, and a acceleration concerning the ego vehicle coordinate system.The transition and process noise covariance matrices defined as and In ( 30) and (32) Δt denotes the elapsed time between timestep k and k − 1.In (31) σ a refers to the acceleration scale.
The smart camera provides detections in a Cartesian coordinate system with longitudinal velocity and acceleration estimation.The radar also provides the position of targets in a Cartesian coordinate system, however, it estimates the velocity of objects with longitudinal and lateral components.Therefore observation matrix of smart camera H k,c and the radar H k,r are therefore: The radar estimates the R k,r measurement covariance of each detection, however, the R k,c of the camera detections are estimated based on the σ d x , σ d y , σ v x , σ a x position, velocity, and acceleration uncertainties given in the technical specification of the sensor as in: The parameters of the GM-PHD filter are obtained in accordance with the simulation environment since it simulates the real sensors.The p d (x) state-dependent detection probability is given with the same parameters as in Table I.The p S (x) survive probability is given by p S (x) = p S = 0.9999 constant value.However, the κ(z) clutter model of the simulation environment is not adapted to the tracking algorithm because the proposed method does not consider the lane information, and the parameters of that model depend highly on the environmental circumstances.Therefore, the κ(z) = λc /V z clutter density is given by a simple uniform distribution above the V z measurement volume

V. RESULTS
In this section, the evaluation results of the simulation environment and real-world measurement data are described.The performance of the simulation environment and real-world measurement are investigated in different aspects.In the simulation environment, the exact states of the ground truth objects are known.Therefore, the GOSPA metric of the different multiobject tracking approaches and their components (missed detections, false detections, localization error) are compared.In practical applications, the precision and recall of the environment perception are essential metrics.Hence, they are also included in the simulation results.However, the state of the objects annotated on the mono-vision images of real-world measurement is approximated with uncertainty, suitable only for the assignment between fused objects and ground truth objects.Thus, the sensor fusion algorithms are evaluated by the MOTA metric considering the precision, recall, and track ID switches neglecting the localization error.Furthermore, since the real sensors also provide tracked objects, their performance metrics are also obtained.The performance gained by the sensor fusion compared to the raw sensor performance is a significant aspect of the EFPS.
The complexity of the proposed robust GM-PHD filter is investigated according to the runtime map over the number of tracked objects and the number of sensor detections compared to the state-of-the-art JIPDA algorithm.The runtime of conventional GM-PHD filters is not detailed since the proposed sensor fusion relies on it; therefore, their complexity has the same characteristics.The runtime evaluation is performed only for the real-world measurement since it includes different sections, consisting of relatively few and many objects as well.

A. Simulation Results
The results of the simulated Adaptive Cruise Control (ACC) and Autonomous Emergency Brake (AEB) scenarios are summarized in Table II and III, detailing the precision, recall, average localization error, and GOSPA metric of the algorithms, highlighting the proposed algorithms with bold fonts.According In the dynamic AEB scenario, both proposed robust GM-PHD filters reduce the GOSPA metric of the JIPDA by 27% and 32%.The evolution of results is shown in Fig. 8 over the simulation time, detailing the number of missed and false detections, the localization error, and the GOSPA metric.
A moving average filters the localization error and GOSPA metric with 2 seconds window size.Since the two conventional (Poisson and Threshold) and proposed robust (MO and TO) GM-PHD filters have similar results to their counterparts, only the better ones (Threshold and TO) are visualized in Fig. 8.The proposed robust object extraction realizes much fewer missed detections compared to the conventional GM-PHD filter in both scenarios according to Fig. 8(a) and Fig. 8(b).The estimation of the existence probability explains this: since the detection probability is high if an object is miss-detected, its PoE would be underestimated by the conventional GM-PHD filter, and it is pruned.Although the radar provides much clutter, the difference between the algorithms more negligible in false detections.The conventional tracker confirms an object provided by the radar if it detects it in two consecutive cycles; therefore, it can filter most false detections.However, according to the comparison between the conventional and TO object extraction approaches, it can be said that the object cardinality is estimated more robustly by the proposed methods in the corner cases of scenarios.The proposed robust GM-PHD filter has a similar performance to JIPDA in object cardinality estimation.Still, it filters the false detections more efficiently in both scenarios because JIPDA duplicates some objects.The localization error of the algorithms increases when the relative velocity between the surrounding objects and ego vehicle is high.Under these conditions, the conventional GM-PHD filter has slightly more accurate localization than the proposed method because the fused state of the objects relies more on the measurement.However, the robust GM-PHD filter estimates the object state more precisely than  the state-of-the-art JIPDA.According to the simulation results, the proposed GM-PHD filters provide a robust object extraction that slightly exceeds even the JIPDA filter's performance thanks to the existence probability computed in a Bayesian manner.Furthermore, the data association method does not influence its performance because the results of the MO and TO approaches do not differ.

B. Real-World Measurement Results
The results of the real-world measurement are summarized in Table IV, including the precision, recall, F1 score, and the MOTA metric of the raw sensor data and the five sensor fusion algorithms.As it was described in Section IV-A, the radar frequently detects and tracks the poles of the highway guardrail and other irrelevant objects that are considered as false positive objects.Hence the precision of the radar is low compared to the smart camera.The MOTA metric is lower than zero because the number of false detections is larger than true positives.Although the smart camera has high precision, the recall is lower than the radar despite its high detection probability because it can provide four object detections maximally.The result of the five fusion algorithms has similar characteristics as in the simulation environment.However, in the case of Poisson distributed object cardinality, the real objects are often duplicated, resulting in more false detections and lower precision.The object extraction performed based on the Threshold method filters most of the false detections reported by the radar, but it frequently miss-detects real objects.The proposed robust object extraction approach tackles the false detections even more efficiently, exceeding the 95% precision of the JIPDA based fusion too.Although the 86 − 87% recall is a bit lower than precision, the proposed GM-PHD filters reduce the number of missed detections of the conventional object extraction methods by 15 − 19%.The JIPDA and proposed robust GM-PHD filters have similar F1 scores, increasing the resultant performance of the conventional GM-PHD filters by 12 − 17%.Still, the proposed fusion algorithm with MO data association exceeds the F1 score of the JIPDA, with 1%.Furthermore, the fusion algorithm performed by the proposed methods increases the F1 score of the best sensor (smart camera) by 7 − 8%.The difference is more significant in terms of the MOTA metric because the corresponding track ID of an object is not switched since the multi-object tracking is more stable.
The runtime evaluation was performed in Matlab R2020b environment on notebook with Intel Core i7-3520 M (2.9 GHz)  processor and 16 GB memory.The resulting characteristics of the state-of-the-art JIPDA algorithm and proposed robust GM-PHD filter over the number of tracked objects and the number of input detections are depicted with logarithmic scale in Fig. 9 and Fig. 10 by the average runtime.Because of the properties of the different sensors, both Fig. 9 and Fig. 10 form an L shape.The radar detects many more objects, including false positives; in contrast, the smart camera can provide four detections at most.Therefore, the radar initiates plenty of objects, and the camera confirms some of them and vice versa.Fig. 9 shows a >800 ms significant peak in average runtime.This drawback of the JIPDA occurs when some objects are relatively close to each other and share some detections within their validation gate, forming one cluster.In this case, the runtime increases exponentially with the number of objects and detections within the cluster due to the combinatorial complexity clearly visible on the linear stage (logarithmic scale) in Fig. 9.However, Fig. 10 demonstrates that the complexity of the proposed robust GM-PHD based sensor fusion is scaled linearly with the number of tracked objects and sensor detections over the entire measurement.The runtime has a 60 ms peak occurred at the initialization of the filter that is still lower than the period time of the sensors.Therefore this algorithm is assumed to be applicable in real traffic conditions considering that Matlab is an insufficient environment in terms of runtime compared to a C/C++ implementation.

VI. CONCLUSION
Although the conventional GM-PHD filter can track multiple time-varying numbers of objects by inserting the first-order moment of the random finite set PDF into the Bayesian framework.However, due to this simplification, the object extraction results in a poor performance in the estimation of object cardinality because the existence probability estimation does not rely on Bayesian recursion.Furthermore, the conventional GM-PHD filter does not involve a sophisticated model that identifies newborn objects.The proposed methods extend the conventional GM-PHD filter with an object birth model that relies on the detections of the sensors considering their birth probability.This birth model reduces the number of duplicated objects and the runtime of the algorithm.The two robust object extraction modules significantly increase the performance of the conventional GM-PHD filters based on the Bayesian existence probability estimation.According to the KPI of the proposed sensor data fusion algorithm, besides the fused data results in a more reliable environment perception than using either of the sensors alone, it reaches and slightly exceeds the performance of the state-of-the-art JIPDA algorithm.Therefore, they provide a robust multi-object tracking fusing multiple sensor data that is a fundamental part of the enhanced frontal perception system of ADAS and other highly automated functions.Furthermore, our solution preserves the favorable complexity of the GM-PHD filter, the linear scaling with the number of objects and detections; hence, there is a powerful runtime reduction compared to the exponential characteristic of JIPDA based sensor fusion.

Fig. 2 .
Fig. 2. The sensor data latency and frequency; Δ L,sensor denotes the latency of sensor data, Δ T,sensor is the time period of the sensor.
: end if 15: end for 16: H b k|k = b 17: go to: Object extraction output: {w b k|k , xb k|k , P b k|k , l b k|k } H b k|k b=1

Fig. 4 .
Fig. 4. The flowchart of the Object extraction module.

Fig. 5 .
Fig. 5.The two types of data association.

Fig. 7 .
Fig. 7.The FoV of the sensor cluster and the surveillance area.

Fig. 9 .
Fig. 9.The average runtime of the JIPDA based sensor fusion depending on the number of tracked objects and detections.

Fig. 10 .
Fig. 10.The average runtime of the robust GM-PHD based sensor fusion depending on the number of tracked objects and detections.

László
Lindenmaier received the B.Sc. and M.Sc.degrees in 2017 and 2020 from the Budapest University of Technology and Economics, Budapest, Hungary, where he is currently working toward the Ph.D. degree with the Department of Control for Transportation and Vehicle Systems.He was with Automotive Industry as a Development Engineer between 2017 and 2020 in field of environment perception and sensor data fusion.His research interests include vehicle control, automotive environment perception, object detection, multi-object tracking, and sensor data fusion.Szilárd Aradi (Member, IEEE) received the M.Sc.and Ph.D. degrees from the Budapest University of Technology and Economics, Budapest, Hungary, in 2005 and 2015, respectively.Since 2016, he has been a Senior Lecturer with the Department of Control for Transportation and Vehicle Systems, Budapest University of Technology and Economics.He is currently with the Department of Control for Transportation and Vehicle Systems, Budapest University of Technology and Economics.His research interests include embedded systems, communication networks, vehicle mechatronics, and reinforcement learning.His research and industrial works include railway information systems, vehicle on-board networks, and vehicle control.Tamás Bécsi (Member, IEEE) received the M.Sc.and Ph.D. degrees from the Budapest University of Technology and Economics, Budapest, Hungary, in 2002 and 2008, respectively.Since 2005, he has been an Assistant Lecturer and since 2014, he has also been an Associate Professor with the Department of Control for Transportation and Vehicle Systems, Budapest University of Technology and Economics.His research interests include linear systems, embedded systems, traffic modeling, and simulation.His research and industrial works include railway information systems and vehicle control.Olivér Törő received the M.Sc.degree from Eötvös Loránd University, Budapest, Hungary, in 2010.He is currently working toward the Ph.D. degree with the Budapest University of Technology and Economics, Budapest, Hungary.Since 2018, he has been an Assistant Research Fellow with the Department of Control for Transportation and Vehicle Systems, Budapest University of Technology and Economics.His research interests include object detection and tracking in road traffic applications, multi-object state estimation, and nonlinear filtering.Péter Gáspár (Member, IEEE) received the M.Sc.and Ph.D. degrees from the Faculty of Transportation Engineering and Vehicle Engineering Közlekedésmérnöki és Jármûmérnöki Kar (KJK), Budapest University of Technology and Economics Budapesti Mûszaki és Gazdaságtudományi Egyetem (BME), Budapest, Hungary, in 1985 and 1997, respectively, and the D.Sc.degree in control from the Hungarian Academy of Sciences (MTA), Budapest, Hungary, in 2007.Since 1990, he has been a Senior Research Fellow with the Institute for Computer Science and Control Számítástechnikai és Automatizálési Kutatóintézet (SZTAKI).Since 2016, he has also been a Research Professor.In 2004, he became the Head of the Vehicle Dynamics and Control Research Group and then in 2017, he became the Head of the Systems and Control Laboratory, SZTAKI.He was habilitated with the BME, in 2008, and he was appointed as the University Professor.Since 2013, he has also been the Head with the Department of Control for Transportation and Vehicle Systems Közlekedés-és Jármûirányítási Tanszék (KJIT), BME KJK.His research interests include linear and nonlinear systems, robust control, multi-objective control, system identification, and identification for control and artificial methods.His research and industrial works include mechanical systems, vehicle structures, and vehicle dynamics and control.Since 2016, he has also been a Corresponding Member of MTA.He is also a Member of the IFAC Automotive Control and Transportation Systems Technical Committee, and the Chair of the International Federation of Automatic Control (IFAC) Hungary National Member Organization.

TABLE I PARAMETERS
OF THE SIMULATED SENSORS

TABLE II PRECISION
, RECALL AND AVERAGE LOCALIZATION ERROR AND GOSPA IN ADAPTIVE CRUISE CONTROL (ACC) SCENARIOwith a λc Poisson rate set to 10 for the radar and 0.01 for the smart camera.Similar to the clutter density, the b(z) = λb /V z birth density is defined by a uniform distribution, and the λb number of expected newborn objects is given as λb = 1 for both sensors.The γ ∃,low and γ ∃,upp probability existence thresholds are set as γ ∃,low = 0.08 and γ ∃,low = 0.65, while the γ h component surviving threshold γ h = 0.03.

TABLE III PRECISION
, RECALL AND AVERAGE LOCALIZATION ERROR AND GOSPA IN AUTONOMOUS EMERGENCY BRAKE (AEB) SCENARIOto TableIIand III, the two conventional GM-PHD filters, the Poisson and the Threshold concept, have very similar results in each aspect of the performance metrics.Both of them filter the false detections quite efficiently, resulting in ∼ 86 − 95% precision values; however, scenario objects are often missed detected, causing a lower recall metric.The state-of-the-art JIPDA algorithm performs significantly better than the conventional GM-PHD filters, reducing the average GOSPA metric by ∼ 55 − 66%.The proposed robust proposed GM-PHD filters have a similarly powerful performance to the JIPDA.The measurement-oriented (MO) GM-PHD filter has a slightly lower recall in the ACC scenario.Still, the track-oriented approach has a better result than the JIPDA filter in every performance aspect.

TABLE IV PERFORMANCE
METRICS OF RAW SENSOR DATA AND FUSION ALGORITHMS