Scalable Detection and Tracking of Geometric Extended Objects

Multiobject tracking provides situational awareness that enables new applications for modern convenience, public safety, and homeland security. This paper presents a factor graph formulation and a particle-based sum-product algorithm (SPA) for scalable detection and tracking of extended objects. The proposed method dynamically introduces states of newly detected objects, efficiently performs probabilistic multiple-measurement to object association, and jointly infers the geometric shapes of objects. Scalable extended object tracking (EOT) is enabled by modeling association uncertainty by measurement-oriented association variables and newly detected objects by a Poisson birth process. Contrary to conventional EOT methods, a fully particle-based approach makes it possible to describe different geometric object shapes. The proposed method can reliably detect, localize, and track a large number of closely-spaced extended objects without gating and clustering of measurements. We demonstrate significant performance advantages of our approach compared to the recently introduced Poisson multi-Bernoulli mixture filter. In particular, we consider a simulated scenarios with up to twenty closely-spaced objects and a real autonomous driving application where measurements are captured by a lidar sensor.


I. INTRODUCTION
Emerging sensing technologies and innovative signal processing methods will lead to new capabilities for autonomous operation in a wide range of problems such as applied ocean sciences and ground navigation. A technology that enables new capabilities in this context is EOT. EOT methods make it possible to detect and track an unknown number of objects with unknown shapes by using modern sensors such as highresolution radar, sonar, and lidar [1].
Due to the high resolution of these sensors, the point object assumption used in conventional multiobject tracking algorithms [2]- [4] is no longer valid. Objects have an unknown shape that must be inferred together with their kinematic state. In addition, data association is particularly challenging due to the overwhelmingly large number of possible association events [5]. Even if only a single extended object is considered, the number of possible measurement-to-object association events scales combinatorially in the number of measurements. In the more realistic case of multiple extended objects, this "combinatorial explosion" of possible events is further exacerbated.

A. State of the Art
An important aspect of EOT is the modeling and inference of object shapes. The shape of an object determines how the measurements that originate from the object are spatially distributed around its center. An important and widely used model [5]- [9] based on random matrix theory has been introduced in [10]. This model considers elliptically shaped objects. Here, unknown reflection points of measurements on extended objects are modeled by a random matrix that acts as the covariance matrix in the measurement model. This random extent state is estimated sequentially together with the kinematic state. The approach proposed in [10] introduces a closed-form update step for linear dynamics. It exploits that the Gaussian inverse Wishart distribution is a conjugate prior for linear Gaussian measurement models.
A major drawback of the original random matrix-based tracking method is that the driving noise variance in the state transition function of the kinematic state must be proportional to the extent of the object [10]. Furthermore, the orientation of the random matrix is constant. These limitations have been addressed by improved and refined random matrixbased extent models introduced in [11]- [13] and particle-based methods [14]. A significant number of additional applicationdependent object extent models have been proposed recently (see [1] and references therein). The state-of-the-art algorithm for tracking an unknown number of objects with elliptical shape is the Poisson multi-Bernoulli mixture (PMBM) filter [5]. The PMBM filter is based on a model [15] where objects that have produced a measurement are described by a multi-Bernoulli probability distribution and objects that exist but have not yet produced a measurement yet are described by a Poisson distribution. This PMBM model is a conjugate prior with respect to the prediction and update steps of the extended object tracking problem [5].
Important methods for the tracking of point objects include joint probabilistic data association (JPDA) [2], multihypothesis tracking (MHT) [16]- [18], and approaches based on random finite sets (RFS) [3], [15], [19]- [21]. Many of these point object tracking methods have been adapted for extended object tracking. In [22], a probabilistic data association algorithm for the tracking of a single object that can produce more than one measurement has been introduced. A multiple-measurement JPDA filter for the tracking of multiple objects has been independently developed in [23] and [24]. The method in [6] combines a random matrix extent model with multisensor JPDA for the tracking of multiple extended objects. The computational complexity of these methods scales combinatorially in the number of objects and the number of measurements. They rely on gating, a suboptimal preprocessing technique that excludes unlikely data association events. Thus, these methods are only suitable in scenarios where objects produce few measurements and no more than four objects come into close proximity at any time [25].
Another widely used approach to limit computational complexity for data association with extended objects is to perform clustering of spatially close measurements. Here, based on the assumption that all measurements in a cluster correspond to the same object, the majority of possible data association events is discarded in a suboptimal preprocessing step. Based on this approach, tracking methods for objects that produce multiple measurements have been developed in the MHT [26] and the JPDA [27], [28] frameworks. Practical implementations of the update step of RFS-based methods for EOT [5], [7]- [9] including the one of the PMBM filter [5] also rely on gating and clustering and suffer from a reduced performance if objects are in close proximity. An alternative approach is to perform data association with extended objects using sampling methods [29], [30]. Here, in scenarios with objects that are in proximity, a better estimation performance can be achieved compared to methods based on clustering. These methods based on sampling, however, also discard the majority of possible data association events, and thus eventually fail if the number of relevant events becomes too large [31].
An innovative approach to high-dimensional estimation problems is the framework probabilistic graphical models [32]. In particular, the SPA [32]- [34] can provide scalable solutions to high-dimensional estimation problems. The SPA performs local operations through "messages" passed along the edges of a factor graph. Many traditional sequential estimation methods such as the Kalman filter, the particle filter, and JPDA can be interpreted as instances of SPA. In addition, SPA has led to a variety of new estimation methods in a wide range of applications [35]- [37]. For probabilistic data association (PDA) with point objects, an SPA-based method referred to as the sum-product algorithm for data association (SPADA) is obtained by executing the SPA on a bipartite factor graph and simplifying the resulting SPA messages [4], [31], [38]. The computational complexity of this algorithm scales as the product of the number of measurements and the number of objects. Sequential estimation methods that embed the SPADA to reduce computational complexity have been introduced for multiobject tracking (MOT) [15], [39]- [43], indoor localization [44]- [47], as well as simultaneous localization and tracking [48]- [50].
SPA methods for probabilistic data association based on a bipartite factor graph have recently been investigated for the tracking of extended objects [25], [51], [52]. Here, the number of measurements produced by each object is modeled by an arbitrary truncated probability mass function (PMF). An SPA for data association with extended objects with a computational complexity that scales as the product of the number of measurements and the number of objects has been introduced [25], [51]. This method can track multiple objects that potentially generate a large number of measurements but was found unable to reliably determine the probability of existence of objects detected for the first time. An alternative method for scalable data association with extended objects has been developed independently and has been presented in [53], [54]. These methods however either assume that the number of objects are known, or that a certain prior information of new object locations is available. They are thus unsuitable for scenarios with spontaneous object birth.

B. Contributions and Notations
In this paper, we introduce a Bayesian particle-based SPA for scalable detection and tracking of an unknown number of extended objects. Object birth and the number of measurements generated by an object are described by a Poisson PMF. The proposed method is derived on a factor graph that depicts the statistical model of the extended object tracking problem and is based on a representation of data association uncertainty by means of measurement-oriented association variables. Contrary to the factor graph used in [51], it makes it possible to reliably determine the existence of objects by means of the SPA. In our statistical model, the object extent is modeled as a basic geometric shape that is represented by a positive semidefinite extent state. Contrary to conventional EOT algorithms with a random-matrix model, the proposed fully particle-based method is not limited to elliptical object shapes. A fully particle-based approach also makes it possible to consider a uniform distribution of measurements on the object extent. Furthermore, our method has a computational complexity that scales only quadratically in the number of objects and the number of measurements. It can thus accurately detect, localize, and track multiple closely-spaced extended objects that generate a large number of measurements. The contributions of this paper are as follows.
• We derive a new factor graph for the problem of detection and tracking of multiple extended objects and introduce the corresponding message passing equations.
• We establish a particle-based scalable message passing algorithm for the detection and tracking of an unknown number of extended objects.
• We demonstrate performance advantages in a challenging simulated scenario and by using real lidar measurements acquired by an autonomous vehicle.
This paper advances over the preliminary account of our method provided in the conference publication [55] by (i) simplifying the representation of data association uncertainty, (ii) including object extents in the statistical model, (iii) presenting a detailed derivation of the factor graph, (iv) establishing a particle-based implementation of the proposed method, (v) discussing scaling properties, (vi) demonstrating performance advantages compared to the recently proposed Poisson multi-Bernoulli mixture filter [5] using synthetic and real measurements.
The probabilistic model established in this paper is closely related to the probabilistic model employed by the PMBM filter (cf. [4]). Our focus is on obtaining a factor graph formulation for which the SPA scales well, and on exploiting the accuracy and flexibility of a particle-based implementation.
Notation: Random variables are displayed in sans serif, upright fonts, while their realizations are in serif, italic fonts. Vectors and matrices are denoted by bold lowercase and uppercase letters, respectively. For example, a random variable and its realization are denoted by x and x; a random vector and its realization by x and x; and a random matrix and its realization are denoted by X and X. Furthermore, x T denotes the transpose of vector x; ∝ indicates equality up to a normalization factor; f (x) denotes the probability density function (PDF) of random vector x. N (x; µ, Σ) denotes the Gaussian PDF (of random vector x) with mean µ and covariance matrix Σ, U(y; S) denotes the uniform PDF (of random vector y) with support S, and W X; q, Q denotes the Wishart distribution (of random matrix X) with degrees of freedom q and mean q Q. The determinant of matrix Q is denoted |Q|. Finally, bdiag(M 1 , . . . , M I ) denotes the block diagonal matrix that consists of submatrices M 1 , . . . , M I .

II. SYSTEM MODEL
At time n, object k is described by a kinematic state and an extent state. The kinematic state x k,n p T k,n m T k,n T consists of the object's position p k,n ∈ R dp and possibly further parameters m k,n ∈ R dm such as velocity or turn rate. The extent state E k,n ∈ R dp×dp is a symmetric, positive semidefinite random matrix that can either model an ellipsoid or a cube. Example realizations of E k,n and corresponding object shapes are shown in Fig. 1. Formally, we also introduce the vector notation e k,n of extent state E k,n , which is the concatenation of diagonal and unique off-diagonal elements of E k,n , e.g., in a 2-D tracking scenario the vector notation of extent matrix E k,n = [e (11) k,n e (21) k,n ] T [e (12) k,n e (22) k,n ] T is given by e k,n = e (11) k,n e (21) k,n e (22) k,n T . Note that the support of e k,n corresponds to all positive-semidefinite matrices in R dp×dp . In what follows, we will use extent matrix E k,n and its vector notation e k,n interchangeably.
As in [4], [40], [50], we account for the time-varying and unknown number of extended objects by introducing potential objects (POs) k ∈ {1, . . . , K n }. The number of POs K n is the maximum possible number of actual objects that have produced a measurement so far [4]. (Note that K n increases with time n.) The existence/non-existence of PO k is modeled by the binary existence variable r k,n ∈ {0, 1} in the sense that PO k represents an actual object if and only if r k,n = 1.
Augmented PO states are denoted as y k,n = [x T k,n e T k,n r k,n ] T . Formally, PO k is also considered if it is non-existent, i.e., if r k,n = 0. The states x k,n of non-existent POs are obviously irrelevant and have no impact on the estimation solution. Therefore, all hybrid continuous/discrete PDFs defined on augmented PO states, f (y k,n ) = f (x k,n , e k,n , r k,n ), are of the form f (x k,n , e k,n , 0) = f k,n f d (x k,n , e k,n ), where f d (x k,n , e k,n ) is an arbitrary "dummy PDF" and f k,n ∈ [0, 1] is the probability that the PO does not exist. 1 For every PO state at previous times, there is one "legacy" PO state y k,n at current time n. The number of legacy objects and the joint legacy PO state at time n are introduced as K n = K n−1 and y n y T 1,n · · · y T K n ,n T , respectively.

A. State-Transition Model
The state-transition pdf for legacy PO state y n factorizes as where the single-object augmented state-transition pdf f y k,n y k,n−1 = f x k,n , e k,n , r k,n x k,n−1 , e k,n−1 , r k,n−1 is given as follows. If PO k does not exist at time n − 1, i.e., r k,n−1 = 0, then it does not exist at time n either, i.e., r k,n = 0, and thus its state pdf is f d x k,n , e k,n . This means that f x k,n , e k,n , r k,n x k,n−1 , e k,n−1 , r k,n−1 = 0 = f d x k,n , e k,n , r k,n = 0 0, r k,n = 1.
On the other hand, if PO k exists at time n−1, i.e., r k,n−1 = 1, then the probability that it still exists at time n, i.e., r k,n = 1, is given by the survival probability p s , and if it still exists at time n, its state x k,n is distributed according to a single-object state-transition pdf f x k,n , e k,n x k,n−1 , e k,n−1 . Thus, f x k,n , e k,n , r k,n x k,n−1 , e k,n−1 , r k,n−1 = 1 = 1− p s f d x k,n , e k,n , r k,n = 0 p s f x k,n , e k,n x k,n−1 , e k,n−1 , r k,n = 1.
It is assumed that at time n = 0, the prior distributions f (y k,0 ) are statistically independent across POs k. If no prior information is available, we have K 0 = 0. A single-object state-transition pdf f x k,n , e k,n x k,n−1 , e k,n−1 that was found useful in many extended object tracking scenarios is given by [12] f x k,n , e k,n x k,n−1 , e k,n−1 where f (x k−1,n ) is the state transition function of the kinematic state x k−1,n , Σ k,n is the kinematic driving noise covariance matrix, and V (m k,n ) is a rotation matrix. The degrees of freedom of the Wishart distribution determine the uncertainty of object extent prediction. A small q k,n implies a large statetransition uncertainty.  Fig. 1: Object shapes represented by extent state E k,n in a 3-D scenario. An ellipsoid (a) or cube (b) is defined by the eigenvalues λ 1,E k,n > λ 2,E k,n > λ 3,E k,n 0 of the 3 × 3 symmetric, positivesemidefinite matrix E k,n . The local reference frame and thus the orientation of the object is defined by the eigenvectors λ 1,E k,n , λ 2,E k,n , and λ 3,E k,n of E k,n .

B. Measurement Model
At time n, a sensor produces measurements z l,n ∈ R dz , l ∈ {1, . . . , M n }. The joint measurement vector is denoted as z n [z T 1,n · · · z T Mn,n ] T . (Note that the total number of measurements M n is random.) Each measurement is either generated by a single object or is a false alarm.
Let z l,n be a measurement generated by the kth PO. 2 In particular, let v (l) k,n be the relative position (with respect to p k,n ) of the reflection point that generated z l,n . The considered general nonlinear measurement model is now given by where d(·) is an arbitrary nonlinear function and u l,n ∼ N (u l,n ; 0, Σ u l,n ) is the measurement noise. (Note that the subspace m k,n remains unobserved.) This measurement model directly determines the conditional PDF f z l,n |x k,n , v (l) k,n . We consider three geometric object shape models. The extent state E k,n = E k,n either defines (i) the covariance matrix of a Gaussian PDF, i.e., v k,n ; 0, E 2 k,n ; (ii) the elliptical support S e (E k,n ) of a uniform PDF, i.e., v k,n ; S e (E k,n ) ; or (iii) the cubical support S c (E k,n ) of a uniform PDF, i.e., v k,n ; S c (E k,n ) . 3 The considered object shape model directly determines the conditional PDF f v (l) k,n |e k,n . The PDF of measurement vector z l,n conditioned on x k,n and e k,n can now be obtained as The fact that the PO with index k generates a measurement implies r k,n = 1. 3 The cubical support Sc(E k,n ) and the elliptical support Se(E k,n ) are defined by the eigenvalues and eigenvectors of E k,n as shown in Fig 1. For example, let us consider a 3-D scenario and let λ 1,E k,n > λ 2,E k,n > λ 3,E k,n 0 be the eigenvalues of the 3 × 3 symmetric, positive-semidefinite matrix E k,n . Length, width, and height of the cube Sc(E k,n ) are given by λ 1,E k,n , λ 2,E k,n , and λ 3,E k,n , respectively. Similarly, the lengths of the three principal semi-axes of the ellipsoid Se(E k,n ), are given by λ 1,E k,n , λ 2,E k,n , and λ 3,E k,n . The orientation of Sc(E k,n ) and Se(E k,n ) is determined by the eigenvectors λ 1,E k,n ,λ 2,E k,n , and λ 3,E k,n . The eigenvalues λ 1,E k,n , λ 2,E k,n , and λ 3,E k,n are assumed distinct. This assumption is always valid for real sensor data and objects of practical interest.
An important special case of this model is the additive-Gaussian case, i.e., For extent model (i), in this case, there exist closed-form expressions for (6). In particular, the likelihood function f (z l,n |x k,n , e k,n ) can be expressed as N z l,n ; p k,n , E 2 k,n + Σ u l,n . Similarly, for models (ii) and (iii), there exists a simple approximation discussed in [56,Section 2], that also results in a closed-form of (6).
If PO k exists (r k,n = 1) it generates a random number of object-originated measurements z l,n , which are distributed according to the conditional PDF f (z l,n |x k,n , e k,n ) discussed above. The number of measurements it generates is Poisson distributed with mean µ m (x k,n , e k,n ). For example, this Poisson distribution can be determined by a spatial measurement density ρ related to the sensor resolution and by the volume or surface area of the object extent, i.e., µ m (x k,n , e k,n ) = ρ|E k,n |. False alarm measurements are modeled by a Poisson point process with mean µ fa and strictly positive PDF f fa (z l,n ).

C. New POs
Newly detected objects, i.e., actual objects that generated a measurement for the first time, are modeled by a Poisson point process with mean µ n and PDF f n (x k,n , e k,n ). Following [4], [5], [15], newly detected objects are represented by new PO states y k,n , k ∈ {1, . . . , M n }. Each new PO state corresponds to a measurement z k,n ; r k,n = 1 means that measurement z k,n was generated by a newly detected object. Since newly detected objects can produce more than one measurement, we define a mapping from measurements to new POs by the following rule. 4 At time n, if multiple measurements l 1 , . . . , l L with L M n are generated by the same newly detected object, we have r kmin,n = 1 for k min = min(l 1 , . . . , l L ) and r k,n = 0 for all k ∈ l 1 , . . . , l L \ k min . As will be further discussed in Section II-D, with this mapping every association event related to newly detected objects can be represented by exactly one configuration of new existence variables r k,n , k ∈ {1, . . . , M n }. We also introduce y n [y T 1,n · · · y T Mn ] T representing the joint vector of all new PO states. Note that at time n, the total number of POs is given by K n = K n + M n .
Since new POs are introduced as new measurements are incorporated, the number of PO states would grow indefinitely. Thus, for the development of a feasible method, a suboptimal pruning step is employed. This pruning step removes unlikely POs and will be further discussed in Section III-A.

D. Data Association Uncertainty
Tracking of multiple extended objects is complicated by the data association uncertainty: it is unknown which measurement z l,n originated from which object k. To reduce computational complexity, following [4], [38], [40] we use measurementoriented association variables if measurement l is not generated by a PO and define the measurement-oriented association vector as b n = [b 1,n · · · b Mn,n ] T . This representation of data association makes it possible to develop scalable SPAs for object detection and tracking. Note that the maximum value in the support of b l,n is K n +l. (As discussed above, a measurement with index l cannot be associated to a new PO with index l > l, i.e., the event in which measurements l and l are generated by the same newly detected object is represented through the new PO l.) This is a direct result of the mapping introduced in Section II-C. In what follows, we write b l,n = k short for b l,n ∈ {0, 1, . . . , K n + l}\{k}.
For a better understanding of new POs and measurementoriented association vectors, we consider simple examples for fixed M n = 3 and K n = 0. The event where all three measurements are generated by the same newly detected object, is represented by r 1,n = 1, r 2,n = 0, r 3,n = 0, and Furthermore, the event where all three measurements are generated by three different newly detected objects, is represented by r 1,n = 1, r 2,n = 1, r 3,n = 1, and b n = [1 2 3] T . Finally, the event where measurements m ∈ {2, 3} are generated by the same newly detected object and measurement m = 1 is a false alarm, is represented by r 1,n = 0, r 2,n = 1, r 3,n = 0, and b n = [0 2 2] T . Note how every event related to newly detected objects is represented by exactly one configuration of new existence variables r n,k , k ∈ {1, 2, 3} and association vector b n .

III. PROBLEM FORMULATION AND FACTOR GRAPH
In this section, we formulate the detection and estimation problem of interest and present the joint posterior pdf and factor graph underlying the proposed EOT method.

A. Object Detection and State Estimation
We consider the problem of detecting legacy and new POs k ∈ {1, . . . , K n +M n } based on the binary existence variables r k,n , as well as estimation of the object states x k,n and extents e k,n from the observed measurement history vector Our Bayesian approach aims to calculate the marginal posterior existence probabilities p(r k,n = 1|z 1:n ) and the marginal posterior pdfs f (x k,n , e k,n |r k,n = 1, z 1:n ). We perform detection by comparing p(r k,n = 1|z 1:n ) with a threshold P th , i.e., if p(r k,n = 1|z 1:n ) > P th [57,Ch. 2], PO k is considered to exist. Estimates of x k,n and e k,n are then obtained for the detected objects k by the minimum meansquare error (MMSE) estimator [57,Ch. 4]. In particular, an MMSE estimate of the kinematic state is obtained aŝ x k,n f (x k,n , e k,n |r k,n = 1, z 1:n )de k,n dx k,n .
Similarly, an MMSE estimateê MMSE k,n of the extent state is obtained by replacing x k,n with e k,n in (8). Fig. 2: Factor graphs for EOT corresponding to the factorization (11). Factor nodes and variable nodes are depicted as boxes and circles, respectively. Messages in blue are calculated only once and messages in red are calculated multiple times due to iterative message passing. The time index n is omitted. The following short notations are used for the messages: We can compute the marginal posterior existence probability p(r k,n = 1|z 1:n ) needed for object detection as discussed above, from the marginal posterior pdf, f (y k,n |z 1:n ) = f (x k,n , e k,n , r k,n |z 1:n ), as p(r k,n = 1|z 1:n ) = f (x k,n , e k,n , r k,n = 1|z 1:n )dx k,n de k,n . (9) Note that p(r k,n = 1|z 1:n ) is also needed for the suboptimal pruning step discussed in Section V-D. Similarly, we can calculate the marginal posterior pdf f (x k,n , e k,n |r k,n = 1, z 1:n ) underlying MMSE state estimation (see (8)) from f (x k,n , e k,n , r k,n |z 1:n ) according to For this reason, the fundamental task is to compute the pdf f (y k,n |z 1:n ) = f (x k,n , e k,n , r k,n |z 1:n ). This pdf is a marginal density of f (y 0:n , b 1:n |z 1:n ), which involves all the augmented states and measurement-oriented association variables up to the current time n.
The main problem to be solved is to find a computationally feasible recursive (sequential) calculation of marginal posterior PDFs f (y k,n |z 1:n ). By performing message passing by means of the SPA rules [33] on the factor graph that represents our statistical model discussed in Section II, approximations ("beliefs")f (y k,n ) of this marginal posterior pdfs can be obtained in an efficient way for all legacy and new POs.

B. The Factor Graph
We now assume that the measurements z 1:n are observed and thus fixed. By using common assumptions [1]- [5], the joint posterior PDF of y 0:n and b 1:n , conditioned on z 1:n is given by 5 h k y k ,n , b l ,n ; z l ,n (11) where we introduced the functions h k y k,n , b l,n ; z l,n , g k y k,n , b k,n ; z k,n , q y k,n |y k,n−1 , and q y k,n that will be discussed next.
The pseudo likelihood functions h k y k,n , b l,n ; z l,n = h k x k,n , e k,n , r k,n , b l,n ; z l,n and g k y k,n , b k,n ; z k,n = g k x k,n , e k,n , r k,n , b k,n ; z k,n are given by h k x k,n , e k,n , 1, b l,n ; z l,n µm(x k,n ,e k,n )f (z l,n |x k,n ,e k,n ) and h k x k,n , e k,n , 0, b l,n ; z l,n 1 − δ b l,n − k as well as and g k x k,n , e k,n , 0, b k,n ; z k,n 1−δ b k,n −(K n +k) . Note that the second line in (13) is zero because, as discussed in Section II-D, the new PO with index k exists (r k,n = 1) if and only if it produced measurement k.
Furthermore, the factors containing prior distributions for new POs q(y k,n ) = q(x k,n , e k,n , r k,n ), k ∈ {1, . . . , M n } are given by q(x k,n , e k,n , r k,n ) 5 A detailed derivation of this joint posterior PDF is provided in [56, Section 1]. µ n f n (x k,n , e k,n ) e −µm(x k,n ,e k,n ) 1−e −µm(x k,n ,e k,n ) , r k,n = 1 f d x k,n , e k,n , r k,n = 0 and the pseudo transition functions (cf. (2) and (3)) for legacy POs q y k,n |y k,n−1 q x k,n , e k,n , r k,n |x k,n−1 , e k,n−1 , r k,n−1 are given by q x k,n , e k,n , r k,n = 1|x k,n−1 , e k,n−1 , r k,n−1 = e −µm(x k,n ,e k,n ) f x k,n , e k,n , r k,n = 1|x k,n−1 , e k,n−1 , r k,n−1 and q x k,n , e k,n , r k,n = 0|x k,n−1 , e k,n−1 , r k,n−1 = f x k,n , e k,n , r k,n = 0|x k,n−1 , e k,n−1 , r k,n−1 .
A detailed derivation of this factor graph is provided in the supplementary material [56]. The factor graph representing factorization (11) is shown in Fig. 2. An interesting observation is that this factor graph has the same structure as a conventional multi-scan tracking problem [4], [58] with M scans. (Here, every measurement is considered an individual scan.) In what follows, we consider a single time step and remove the time index n for the sake of readability.

IV. THE PROPOSED SUM-PRODUCT ALGORITHM
Since our factor graph in Fig. 2 has cycles, we have to decide on a specific order of message computation [33]. We choose this order according to the following rules: (i) messages are not sent backward in time 6 [4], [40]; and (ii) at each time step messages are computed and processed in parallel. With these rules, the generic message passing equations of the SPA [33] yield the following operations at each time step. The corresponding messages are shown in Fig. 2.

A. Prediction Step
First, a prediction step is performed for all legacy POs k ∈ K. Based on SPA rule [33, Eq. (6)], we obtain wheref (x − k , e − k , r − k ) is the belief that was calculated at the previous time step. Recall that the integration de − k in (14) is performed over the support of e − k , which corresponds to all positive-semidefinite matrices E − k . Next, we use the expression for q(x k , e k , r k | x − k , e − k , r − k ) as introduced in Section III-B and in turn (2) and (3) for . In this way, we obtain the following expressions for (14) α(x k , e k , r k = 1)= p s e −µm(x k,n ,e k,n ) f and α(x k , e k , r We note thatf − k = f (x − k , e − k , 0)dx − k de − k approximates the probability of non-existence of legacy PO k at the previous time step. After the prediction step, the iterative message passing is performed. For future reference, we also introduce α k α k (x k , e k , r k = 1)dx k de k + α n k .

B. Iterative Message Passing
At iteration p ∈ {1, . . . , P }, the following operations are computed for all legacy and new POs.
1) Measurement Evaluation: The messages β (p) kl (b l ), k ∈ {1, . . . , K}, l ∈ {1, . . . , M } sent from factor nodes h k y k , b l ; z l = h k x k , e k , r k , b l ; z l to variable nodes b l can be calculated as discussed next. First, by using the SPA rule [33, Eq. (6)], we obtain Note that for p = 1, we set α kl (x k , e k , r k ) α(x k , e k , r k ), and for p > 1 we use α kl (x k , e k , r k ) (represented by α (p) kl (x k , e k , r k = 1) and α n(p) kl ) calculated in Section IV-B4. By using the expressions for h k x k , e k , r k , b l ; z l introduced in Section III-B, (17) can be further simplified, i.e., and B (p) kl (x k , e k , r k = 1)dx k de k + α n(p) kl . After multiplying these two expressions by 1/α (p) kl , the message β (p) kl (b l ) is given by 7 and β (p) kl (b l = k) = 1. This final normalization step makes it possible to perform data association and measurement update discussed in the next two sections more efficiently. For future reference, we introduce the short notation β (p) The messages β  kl (x k , e k , r k = 1) and α  kl (x k , e k , r k = 1) and α (p) kl , respectively, we obtain β  kl (b l = K + k) = 1. Note that for p = 1 we set α (1) kl (x k , e k , r k ) q(x k , e k , r k ) and for p > 1 we again calculate α (p) kl (x k , e k , r k ) as discussed in Section IV-B4. Finally, the messages β (p) kk (b k ), k ∈ {1, . . . , M } sent from factor nodes g k y k , b k ; z k = g k x k , e k , r k , b k ; z k to variable nodes b k are calculated by also replacing h k x k , e k , r k , b l ; z l in (17) by g k x k , e k , r k , b k ; z k and performing similar simplification steps. In particular, we get α n(p) kk α (p) kk (x k , e k , r k = 0)dx k de k for b k = K + k and a result equal to (18) (with l replaced by k and α (p) kk (x k , e k , r k )) for b k = k. After multiplying both expressions by 1/α n(p) kk , the message β For future reference, we again introduce the short notation β By using (19) and (20) in (21), we obtain ν (p) A similar expression is obtained for the messages ν (p) lk (b l ) sent from variable nodes b l , l ∈ {1, . . . , M } to factor nodes h K+k y k , b l ; z l , k ∈ 1, . . . , l − 1 and factor node g k y k , b l ; z k , k = l, i.e., By again using (19) and (20) in (22), we obtain ν (p) k l , k ∈ {1, . . . , K}, and ν By using again the expression for h y k , b l ; z l introduced in Section III-B, these messages γ lk (x k , e k , r k ) can be further simplified as By using the simplification of (21) discussed in the previous Section IV-B2, we can rewrite (24) as where we introduced the sum A similar result can be obtained for the messages γ   lk (b l ) as well as by performing the same simplification step discussed above.

C. Belief Calculation
After the last iteration p = P , the belieff (y k ) f (x k , e k , r k ) of legacy PO state k ∈ {1, . . . , K} can be calculated as the normalized product of all incoming messages [33], i.e., where the normalization constant (cf. (16) and (25)) reads Similarly, the belief b(y k ) b(x k , e k , r k ) of augmented new PO state k = {1, . . . , M }, is given bỹ Here, C k is again the normalization constant that guarantees that (32) is a valid probability distribution. Note that a message passing order where messages are calculated sequentially and for each measurement individually is discussed in [55]. As demonstrated in [55, Section V], parallel processing leads to improved performance compared to sequential processing.

D. Computational Complexity and Scalability
In the prediction step, (18) has to be performed K times. Thus, its computational complexity scales as O(K). The computational complexity related to each message passing iteration p ∈ {1, . . . , P } is discussed next. In the measurement evaluation step, for legacy PO k ∈ {1, . . . , K}, a total of M messages β (p) kl (b l ) have to be calculated. Similarly, for every new PO k ∈ {1, . . . , M }, a total of l ∈ {k, . . . , M } messages β (p) kl (b l ) has to be obtained. Thus, the total number of messages is KM + 1/2M 2 . The computational complexity related to calculating each individual message is constant in K and M . Also in the data association, measurement update, and extrinsic information steps, a total of KM + 1/2M 2 messages have to be calculated at each of the three steps. The computational complexity related to the calculation of the individual messages is again constant in K and M . For the data association and measurement update steps, this constant computational complexity is obtained by precomputing the sums  Conservatively, we consider this to be quadratic in the number of measurements and objects, since existing objects are also expected to contribute measurements. We observed that increasing the number of message passing iterations beyond P = 3, does not significantly improve performance in typical EOT scenarios. Note that the computational complexity can be further reduced by preclustering of measurements (see, e.g., [7,Section IV]) and censoring of messages (see, e.g., [55,Section IV]). Preclustering combines the M measurements to a smaller number M < M of joint "hyper measurements" and replaces the single measurement ratios in (12) and (13) by the corresponding product of ratios. Censoring aims to omit messages related to new POs that are unlikely to represent an actual object.

V. PARTICLE-BASED IMPLEMENTATION
For general state evolution and measurement models, the integrals in (15)-(18) as well as the message products in (28)-(32) typically cannot be evaluated in closed form. Therefore, we next present an approximate particle-based implementation of these operations that can be seen as a generalization of the particle-based implementation introduced in [40] to EOT. Pseudocode for the presented particle-based implementation is provided in [56,Section 3]. Each belief f (y k ) f (x k , e k , r k ) is represented by a set of particles and corresponding weights x andf (x k , e k , 0) is given implicitly by the normalization property off (x k , e k , r k ), i.e.,f (x k , e k , 0) = 1 − f (x k , e k , 1)dx k de k . Contrary to conventional particle filtering [59], [60] and as in [40], the particle weights w (j) k , j ∈ {1, . . . , J} do not sum to one; instead, Note that since f (x k , e k , 1)dx k de k approximates the posterior probability of existence for the object, it follows that the sum of weights p e k is approximately equal to the posterior probability of existence.

A. Prediction
The particle operations discussed in this section are performed for all legacy POs k ∈ {1, . . . , K} in parallel. For each legacy PO k, J particles and weights x representing the previous belieff (x − k , e − k , r − k ) were calculated at the previous time n − 1 as described further below. Weighted particles x representing the message α(x k , e k , 1) in (15) are now obtained as follows. First, for each particle x Note that the proposal distribution [59], [60] underlying (34) is ) for j ∈ {1, . . . , J}. A particle-based approximationα n k of α n k in (16) is now obtained as where p −e k J j=1 w −(j) k . Finally, a particle approximatioñ α k of α k introduced in Section IV-A is given bỹ

B. Measurement Evaluation
Let the weighted particles x k , w (p,j) kl J j=1 and the scalarα (p) kl be a particle-based representation of (19), can now be obtained as kl (x k , e k , r k = 1)dx k de k in (17). This Monte Carlo integration is based on the proposal distribution α related to new POs can be obtained. Here, for Monte Carlo integration and further particle-based processing, it was found useful to use a proposal distribution f p x k , e k ; z k that is calculated from the measurement z k and its uncertainty characterization, e.g., by means of the unscented transformation [61].
Note that calculation of these messages relies on the likelihood function f (z l |x k , e k ) introduced in (6), which involves the integration dv (l) k,n . For general nonlinear and non-Gaussian measurement models, evaluation of the likelihood function f (z l |x k , e k ) can potentially also be performed by means of Monte Carlo integration [60]. Alternatively, if the measurement model d(·) is invertible in the sense that we can reformulate (5) as then an approximated linear-Gaussian measurement model (7) can be obtained. In particular, the PDF of d −1 z l + u l ) (for observed z l ) is approximated by a Gaussian with meanz l = d −1 z l +µ u l ) and covariance matrix Σz l . This approximation can be obtained, e.g., by linearizing d −1 ·) or by applying the unscented transformation [61]. As a result, closed-form expressions for f (z l |x k , e k ) discussed in Section II-B and [56, Section. 2] can be used.

C. Measurement Update, Belief Calculation, and Extrinsic Informations
The approximate measurement evaluation messages discussed in Section V-B are used for the subsequent approximation of ξ (p) kl and ξ (p) kl (cf. (26)) required in the measurement update step (cf. (25) and (27)). The calculation of the weighted particles x that represent the legacy PO belief in (30) is discussed next. Weighted particles representing new PO beliefs (32) and extrinsic information messages in (28) and (29) can be obtained by performing similar steps.
The measurement update step (25) and the belief calculation step (30) are implemented by means of importance sampling [59], [60]. To that end, we first rewrite the belieff (y k ) = f (x k , e k , r k ) in (30) by using (15), (16), and (25), i.e., Here, we also replaced ξ (P ) kl introduced in (26) by its particlebased approximationξ (P ) kl (cf. Section V-B), even though we do not indicate this additional approximation in our notatioñ f (x k , e k , r k ).
We now calculate nonnormalized weights corresponding to (37) for each particle j ∈ {1, . . . , J} as Note that here we perform importance sampling with proposal density α(x k , e k , 1). This proposal density is represented by the weighted particles x . Similarly, we calculate a single nonnormalized weight corresponding to (37) as in whichα n k has been calculated in (35). Next, weighted particles x Here, w B k + J j=1 w A(j) k is a particle-based approximation of the normalization constant C k in (31). We recall that p e k = J j=1 w (j) k . Next, we discuss the calculation of the particle representations of the extrinsic information messages in (28) and (29). For example, weighted particles x representing α (p+1) kl y k can be obtained as follows. The particle locations and extents x kl . This is equal to plugging (25) into (28) and evaluating (28) at the particles x Here, normalization of the weights in (40) can be avoided and the constantα (p+1) kl is obtained as . (39)).

D. Detection, Estimation, Pruning, and Resampling
The weighted particles x can now be used for object detection and estimation. First, for each (legacy or new) PO k, an approximation p e k of the existence probability p(r k = 1|z) is calculated from the particle weights w (j) k J j=1 as in (33). PO k is then detected (i.e., considered to exist) if p e k is above a threshold P th (cf. Section III-A). For the detected objects k, an approximation of the MMSE estimatex MMSE k of the kinematic state in (8) is calculated according tô Similarly, an MMSE estimateê k of the extent state can be obtained by replacing x k . As discussed in Section II-C, the number of POs would grow with time k. Therefore, legacy and new POs whose approximate existence probabilities p e k are below a threshold P pr are pruned [4], [15] from the state space. In addition, a resampling step may be performed to avoid particle degeneracy [59], [60].

VI. NUMERICAL RESULTS
Next, we report simulation results evaluating the performance of our method and comparing it with that of the PMBM filter implementation in [1]. Note that a performance comparison with other data association algorithms based on the SPA has been presented in [55].

A. Simulation Scenario
We simulated ten extended objects whose states consist of two-dimensional (2-D) position and velocity, i.e., x k,n = [p (1) k,n p n,k ] T . The objects move in a region of interest (ROI) defined as [−150m, 150m] × [−150m, 150m] and according to the nearly constant-velocity motion model, i.e., x k,n = Ax k,n−1 + W c k,n , where A ∈ R 4×4 and W ∈ R 4×2 are chosen as in [62, Section 6.3.2] with T = 0.2s, and c k,n ∼ N (c k,n ; 0, σ 2 c I 2 ) with σ c = 1 m/s 2 is an independent and identically distributed (iid) sequence of 2-D Gaussian random vectors.
We considered a challenging scenario where the ten object tracks intersect at the ROI center. The object tracks were generated by first assuming that the ten objects start moving toward the ROI center from positions uniformly placed on a circle of radius 75 m about the ROI center, with an initial speed of 10 m/s, and then letting the objects start to exist (i.e.,  Fig. 3(a).
Since the PMBM filter is based on the Gaussian inverse Wishart model [5], we consider elliptical object shapes and a linear-Gaussian measurement model. In particular, a measurement l that was originated by object k, is modeled as where u l,n ∼ N (u l,n ; 0, σ 2 u I 2 ) is the measurement noise with σ u = 1m. In addition, v (l) k,n ∼ N v (l) k,n ; 0, Σ v is the random relative position of the reflection point with Σ v determined by the extent state. The mean of the number of measurements L k,n is µ m = 8 for all objects and the mean number of false alarm measurements is µ fa = 10. The false alarm PDF f fa (z l,n ) is uniform on the ROI.
The performance of all simulated methods is measured by the optimal subpattern assignment (OSPA) [63] and the generalized OSPA (GOSPA) [64] metrics. Both metrics are based on the Gaussian-Wasserstein distance with parameters p = 1 and c = 20. The threshold for object declaration is P th = 0.5 and the threshold for pruning POs is P th = 10 −3 . These parameters are used for all simulated methods.

B. Performance Comparison with the PMBM Filter
For the proposed SPA-based method we use the following settings. Newly detected objects are modeled as f n (x k,n , e k,n ) = f n (p k,n )f n (m k,n )f n (e k,n ), with f n (p k,n ) uniformly distributed on the ROI, f n (m k,n ) following a zeromean Gaussian PDF with covariance matrix 10 2 m 2 /s 2 I 2 , and f n (e k,n ) distributed according to an inverse Wishart distribution with mean matrix 3I 2 and 100 degrees of freedom. The proposed method uses the state transition model in (4) with V (m k,n−1 ) given by I 2 and q k,n = 20000. The mean number of newly detected objects is set to µ n = 10 −2 . To facilitate track initialization, we perform message censoring and ordering of measurements as discussed in [55]. The number of SPA iterations was set to P ∈ {2, 3} and the number of particles was set to J = {1000, 10000}. The resulting parameter combinations are denoted as "SPA-2-1000", "SPA-3-1000", "SPA-2-10000", and "SPA-3-10000", respectively.
For the PMBM filter, we set the Poisson point process that represents object birth consistent with the newly detected object representation introduced above. Furthermore, the probability of detection is set to 1. The Gamma distribution has an a priori mean of µ m = 8 and a variance of 10 −4 . Its parameters remain unchanged at all time steps. The transformation matrix and maneuvering correlation constant (see [5, Table III]) used for extent prediction are set to I 2 and 10 5 , respectively. The PMBM implementation described in [1] relies on measurement gating and clustering as well as pruning of global association events [5]. The gate threshold is chosen such that the probability that an object-oriented measurement is in the gate is 0.999.
Clusters of measurements and likely association events are obtained by using the density-based spatial clustering (DB-SCAN) and Murthy's algorithm, respectively. We simulated two different settings for measurement clustering and event pruning. Coarse clustering "PMBM-C" calculates measurement partitions by using the 50 different distance values equally spaced between 0.1 and 5 as well as a maximum number of 20 assignments for each partition of measurements. Fine clustering "PMBM-F" clusters with 2000 different distance values equally spaced between 0.01 and 20 as well as uses a maximum number of 200 assignments for each partition of measurements. Clustering is performed for each distance value individually. All resulting clusters are then combined into one joint set of clusters. In this way, a diverse set of overlapping clusters is obtained. We also simulated variants of the PMBM that perform recycling of pruned Bernoulli components [65]. These variants are denoted as "PMBM-CR" and "PMBM-FR".  Fig. 3(a), it can be seen that PMBM-FR is unable to accurately estimate the state of objects that are in close proximity. Fig. 4 shows the mean OSPA error and its state and cardinality error contributionsaveraged over 1200 simulation runs-of four simulated methods versus time. It can be seen that the proposed SPA-based methods outperform the PMBM at those time steps where objects are in close proximity. This can be explained by the fact that, due to its excellent scalability, the proposed SPAbased method can avoid clustering as performed by the PMBM filter implementations. In Fig. 4(c), it is shown that the main reason for the increased OSPA error of PMBM is an increased cardinality error. This is because large clusters that consist of measurements generated by multiple objects are associated to a single object. Thus, to certain other objects, no measurements are assigned, their probability of existence is reduced, and they are declared to be non-existent. The reduced state error of the PMBM methods compared to the proposed SPA method around time step 40 in Fig. 4(b) can be explained as follows. Since in this scenario PMBM methods tend to underestimate the number of objects, the optimum assignment step performed for OSPA calculation tends to find a solution with a lower state error. Table I shows the mean GOSPA error and corresponding individual error contributions as well as runtimes per time step for MATLAB implementations on a single core of an Intel Xeon Gold 5222 CPU. Notably, despite not using gating, measurement clustering, and pruning of association events, the proposed SPA method has a runtime that is comparable with the PMBM filter.

C. Size, Orientation, and Scalability
To investigate the capability of the methods to estimate size and orientation, we considered a scenario similar to that discussed in Section VI-A, changing the prior distribution of object extent to an inverse Wishart distribution with mean diag 3m, 1.5m . Estimated object sizes are calculated from estimated extent statesÊ k,n as the area of the represented ellipses, i.e., as πλ 1,E k,nλ 2,E k,n . Similarly, orientation is obtained by restricting the larger of the two eigenvectorsλ 1,E k,n ofÊ k,n to the upper half plane and calculating its angle. To compute mean size errors and mean orientation errors,  we use the optimum assignments of the OSPA [63] metric calculated as discussed in Section VI-A for each time steps and each of the 120 simulation runs. Then, we use these optimum assignments to calculate mean size errors and mean orientation errors. Fig. 5 shows the resulting mean errors of the four simulated methods versus time. It can be seen that the proposed particle-based SPA method can estimate size and orientation more reliably than the PMBM filter when objects are in close proximity To demonstrate scalability, we furthermore simulated the same scenario discussed in Section VI-A but with the number of objects increased to 20. Fig. 6 shows the mean OSPA error-averaged over 120 simulation runs-of four simulated methods versus time. Also in this larger scenario the performance advantages of the SPA methods over the PMBM methods are comparable to the smaller scenario with 10 objects. Plots for individual error contributions are similar to the ones for the scenario with 10 objects and are thus omitted. The average runtimes per time step for MATLAB implementations on a single core of an Intel Xeon Gold 5222 CPU were measured as 48.1s for SPA-2-10000, 76.0s for SPA-3-10000, 6.7s for PMBM-CR, and 171.4s for PMBM-FR. We emphasize that this simulation is an extreme case, where all 20 objects come into close proximity at the mid- point in time. As previously discussed, additional, well-known techniques can be utilized to limit the growth in complexity for well-spaced objects to linear by decoupling the joint EOT problem into smaller separate ones. The unique benefit of the proposed method is its ability to scale in problems where object proximity precludes trivial decoupling.

VII. REAL-DATA PROCESSING
To validate our method, we present results in an urban autonomous driving scenario. The observations are part of the nuScenes datasets [66] and were collected by a lidar sensor mounted on the roof of an autonomous vehicle. The dataset consists of 850 scenes. Every scene has a duration of 20s and consists of approximately 400 sets of lidar observations (or "point clouds") and corresponding ground truth annotations for vehicles and pedestrians. A single lidar measurement is given by the 3-D Cartesian coordinates of a reflecting obstacle in the environment. Annotations are 3-D cubes defined by a position, dimensions, and heading.
We focused on tracking of vehicles in the environment. To extract measurement points related to reflections on vehicles, we used the supervised learning method presented in [67]. In particular, we employed the point clouds and annotations of 700 scenes for the training of the deep neural network (see [67] for details). After training, measurement extraction provides cubes that indicate vehicle locations and corresponding confidence scores on (0, 1]. We only used lidar observations inside vehicle-related detected cubes that have a confidence score larger than 0.25 as measurements for the EOT methods. Measurements and ground truth annotations are converted to a global reference frame. The ROI covers an area of 62, 500m 2 . The four scenes S1-S4 8 considered for performance evaluation, have not been used for the training. We represent the extent state of vehicles by using the 2-D version of the cubical model introduced in Section II-B and [56, Section 2]. The kinematic states of vehicles is modeled by their 2-D position and velocity, as well as their turn rate t k,n and their mean number of generated measurements s k,n , i.e., x k,n = [p (1) k,n p (2) k,nṗ n,k t k,n s k,n ] T . Since the number of measurements that a vehicle generates strongly depends on its distance to the lidar sensor, the mean number of measurements is part of the random vehicle state, i.e., µ m (x k,n , e k,n ) = s k,n . The mean number of measurements s k,n follows the state transition model in [5, Table III] with parameter η = 10. The kinematic vehicle state and the extent state follow the state transition model in (4) with f (·) and V (·) as given in [12]. The parameters of the state transition model are q = 2000, σ c = 10 m/s 2 , and σ t = 0.003 rad/s 2 . The survival probability is p s = 0.999. In addition, the linear-Gaussian measurement model in (42) with σ u = 5 · 10 −3 m was used. Extracted lidar observations are voxelized with a resolution of 0.5m and their z-coordinate is ignored. We model newly detected vehicles as f n (x k,n , e k,n ) = f n (p k,n )f n (e k,n )f n (m k,n ), with f n (p k,n ) uniformly distributed on the ROI and f n (e k,n ) following an inverse Wishart distribution with mean matrix 3I 2 and 30 degrees of freedom. In addition, we set f n (m k,n ) = f n (ṗ k,n , t k,n )f n (s k,n ) where f n (ṗ k,n , t k,n ) is zero-mean Gaussian distributed with covariance matrix diag 5 2 m 2 /s 2 , 5 2 m 2 /s 2 , 0.01 2 rad 2 /s 2 and f n (s k,n ) follows a Gamma distribution with mean 30 and variance 5. The number of SPA iterations is P = 3 and the number of particles was again either set to J = 1000 or to J = 10000 (denoted as SPA-1000 and SPA-10000, respectively). All the other parameters are set as described in Section VI-B. We use the PMBM filter based on the same system model described in Section VI-A as a reference method. The driving noise standard deviation of the PMBM was set to σ c = 30 m/s 2 . Furthermore, the transformation matrix and maneuvering correlation constant (see [5, Table III]) used for extent prediction were set to I 2 and 20 respectively. All other model parameters were chosen as for the proposed method. The parameters for measurement clustering, event pruning, and recycling were set as discussed in Section VI-B (denoted as PMBM-C, PMBM-F, PMBM-CR, and PMBM-FR).
As performance metric we use the generalized OSPA (GOSPA) [64] based on the 2-norm with parameters p = 1 and c = 5. Only the kinematic state was used for the evaluation of the GOSPA. The mean GOSPA for the individual methodsaveraged over the four considered scenes and all time stepsas well as runtimes per time step on a single core of an Intel Xeon Gold 5222 CPU are summarized in Table II. It can again be seen that the proposed SPA-based method outperforms the PMBM in terms of mean GOSPA and mean individual error contributions. These performance advantages of the proposed SPA-based method are related to its more accurate system model as well as its particle-based implementation. Interestingly, all PMBM variants perform very similarly.

VIII. CONCLUSION
Detection, localization, and tracking of multiple objects is a key task in a variety of applications including autonomous navigation and applied ocean sciences. This paper introduced a scalable method for EOT. The proposed method is based on a factor graph formulation and the SPA. It dynamically introduces states of newly detected objects, and efficiently performs probabilistic data association, allowing for multiple measurements per object. Scalable inference of object tracks and their geometric shape is enabled by modeling association uncertainty by measurement-oriented association variables and newly detected objects by a Poisson birth process. The fully particle-based approach can represent the extent of objects by different geometric shapes. In addition, it yields a computational complexity that only scales quadratically in the number of objects and the number of measurements. This excellent scalability translates to an improved EOT performance compared to existing methods because it makes it possible to (i) generate and maintain a very large number of POs and (ii) avoid clustering of measurements, allowing problems with closely-spaced extended objects to be addressed.
Simulation results in a challenging scenario with intersecting object tracks showed that the proposed method can outperform the recently introduced Poisson multi-Bernoulli (PMBM) filter despite yielding a reduced computational complexity. The application of the proposed method to real measurement data captured by a lidar sensor in an urban autonomous driving scenario also demonstrated performance advantages compared to the PMBM filter. For the extraction of vehicle-related lidar measurements, supervised learning based on a deep neural network was used. This motivates future research on multiobject tracking methods that closely integrate neural networks and iterative message passing. Other promising directions for future research are the development of highly parallelized variants of the proposed method that exploit particle flow [68] and are suitable for real-time implementations on graphical processing units (GPUs).